date:20140818

Grouping based on multiple filters/criterias

2014-08-18 Thread deniz

is it possible to have multiple filters/criterias on grouping? I am trying to
do something like those, and I am assuming that from the statuses of the
tickets, it doesnt seem possible?

https://issues.apache.org/jira/browse/SOLR-2553
https://issues.apache.org/jira/browse/SOLR-2526
https://issues.apache.org/jira/browse/LUCENE-3257

To make everything clear, here is details which I am planning to do with
Solr...

so there is an activity feed of a site and it is basically working like
facebook or linkedin newsfeed, though there is no relationship between
users, it doesnt matter if i am following someone or not, as long as their
settings allows me to see their posts and they hit my search filter, i will
see their posts.

the part related with grouping is tricky... so lets assume that you are able
to see my posts, and I have posted 8 activities in the last one hour, those
activities should appear different than other posts, as it would be a
combined view of the posts...

i.e
deniz
activity one
activity two
.
activity eight
/deniz
other user 1
single activity
/other user 1
another user 1
single activity
/another user 1
other user 2
activity one
activity two
/other user 2

So here the results should be grouped depending on their post times...

on solr (4.7.2), i am indexing activities as documents, and each document
has bunch of fields including timestamp and source_user etc etc.

is it possible to do this on current solr?

(in case the details are not clear, please feel free to ask for more details
:) )

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context:
http://lucene.472066.n3.nabble.com/Grouping-based-on-multiple-filters-criterias-tp4153462.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please help to filter of group.limit

2014-08-18 Thread Phi Hoang Hai

Dear everyone.
My problem have 2 query
1) Get top 1 of each group (group.limit = 1 AND group.sort = date desc 
group.field=ABC)
2) Filter to get document of each group match condition. If document don't
match condition then remove of list result.

Help me.
Thanks you.

Hải

optimize and .nfsXXXX files

2014-08-18 Thread BorisG

Hi,
I am using solr 3.6.2. 
I use NFS and my index folder is a mounted folder.
When I run the command:
server:port/solr/collection1/update?optimize=truemaxSegments=1waitFlush=trueexpungeDeletes=true
in order to optimize my index, I have some .nfsX files created while the
optimize is running.
The problem that i am having is that after optimize finishes its run the
.nfs files aren't deleted.
When I close the solr process they immediately disappear.
I don't want to restart the solr process after each optimize, is there
anything that can be done in order for solr to get rid of those files.

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/optimize-and-nfs-files-tp4153473.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr Clustering component different results than Carrot workbench

2014-08-18 Thread Yavar Husain

Though I am interacting with Dawid (creator of Carrot2) on Carrot2 mailing
list however just wanted to post my problem to a wider audience.

I am using Solr 4.7 (on both windows and linux) and saved my
lingo-attributes.xml file from the workbench which I am using in Solr. Note
that for testing I am just having one solr Index and all the queries are
getting fired on that.

Now the clusters that I am getting are good in the workbench (carrot) but
pathetic in Solr. In the logs (jetty) I can see:

Loaded Solr resource: clustering/carrot2/lingo-attributes.xml, so that
indicates that my attribute file is being loaded.

I am really confused what is accounting for the difference in the two
outputs (workbench vs Solr). Again to reiterate the data sources are same
(just one solr index and same queries with 100 results). This is happening
on both Linux and Windows.

Given below is my search component and request handler configuration:

searchComponent name=clustering
   enable=${solr.clustering.enabled:true}
   class=solr.clustering.ClusteringComponent 
lst name=engine
  str name=namelingo/str

  !-- Class name of a clustering algorithm compatible with the Carrot2
framework.

   Currently available open source algorithms are:
   * org.carrot2.clustering.lingo.LingoClusteringAlgorithm
   * org.carrot2.clustering.stc.STCClusteringAlgorithm
   *
org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm

   See http://project.carrot2.org/algorithms.html for more
information.

   A commercial algorithm Lingo3G (needs to be installed
separately) is defined as:
   * com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm
--
  str
name=carrot.algorithmorg.carrot2.clustering.lingo.LingoClusteringAlgorithm/str
  str name=LingoClusteringAlgorithm.desiredClusterCountBase30/str


  !-- Override location of the clustering algorithm's resources
   (attribute definitions and lexical resources).

   A directory from which to load algorithm-specific stop words,
   stop labels and attribute definition XMLs.

   For an overview of Carrot2 lexical resources, see:

http://download.carrot2.org/head/manual/#chapter.lexical-resources

   For an overview of Lingo3G lexical resources, see:

http://download.carrotsearch.com/lingo3g/manual/#chapter.lexical-resources
   --
  str name=carrot.resourcesDirclustering/carrot2/str
/lst


  /searchComponent

  !-- A request handler for demonstrating the clustering component

   This is purely as an example.

   In reality you will likely want to add the component to your
   already specified request handlers.
--
  requestHandler name=/clustering
  enable=${solr.clustering.enabled:true}
  class=solr.SearchHandler
lst name=defaults
  bool name=clusteringtrue/bool
  bool name=clustering.resultstrue/bool
  !-- Field name with the logical title of a each document
(optional) --
  str
name=carrot.algorithmorg.carrot2.clustering.lingo.LingoClusteringAlgorithm/str
  str name=carrot.resourcesDirclustering/carrot2/str
  str name=carrot.titlefilm_id/str
  !-- Field name with the logical content of a each document
(optional) --
  str name=carrot.snippetdescription/str
  !-- Apply highlighter to the title/ content and use this for
clustering. --
  bool name=carrot.produceSummarytrue/bool
  !-- the maximum number of labels per cluster --
  !--int name=carrot.numDescriptions5/int--
  !-- produce sub clusters --
  bool name=carrot.outputSubClustersfalse/bool
  str name=rows100/str
/lst
arr name=last-components
  strclustering/str
/arr
  /requestHandler

Re: optimize and .nfsXXXX files

2014-08-18 Thread Michael McCandless

Soft commit (i.e. opening a new IndexReader in Lucene and closing the
old one) should make those go away?

The .nfsX files are created when a file is deleted but a local
process (in this case, the current Lucene IndexReader) still has the
file open.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Aug 18, 2014 at 5:20 AM, BorisG boris.golo...@mail.huji.ac.il wrote:
 Hi,
 I am using solr 3.6.2.
 I use NFS and my index folder is a mounted folder.
 When I run the command:
 server:port/solr/collection1/update?optimize=truemaxSegments=1waitFlush=trueexpungeDeletes=true
 in order to optimize my index, I have some .nfsX files created while the
 optimize is running.
 The problem that i am having is that after optimize finishes its run the
 .nfs files aren't deleted.
 When I close the solr process they immediately disappear.
 I don't want to restart the solr process after each optimize, is there
 anything that can be done in order for solr to get rid of those files.

 Thanks,




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/optimize-and-nfs-files-tp4153473.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Editing http://wiki.apache.org/solr/PublicServers

2014-08-18 Thread ikulcsar


Hy,

My name is Istvan Kulcsar and i would like to edit this page:
http://wiki.apache.org/solr/PublicServers

Here is some SOLR search:
http://www.odrportal.hu/kereso/
http://idea.unideb.hu/idealista/
http://www.jobmonitor.hu
http://www.profession.hu/
http://webicina.com/
http://www.cylex.hu/
Én (14.08.13 23:14)
http://kozbeszerzes.ceu.hu/

Thanks for help.

Greets,
Steve

Re: Retrieving and updating large set of documents on Solr 4.7.2

2014-08-18 Thread Otis Gospodnetic

Hi,

Not sure if you've seen https://issues.apache.org/jira/browse/SOLR-5244 ?

It's not in Solr 4.7.2, but may be a good excuse to update Solr.

Otis
--
Solr Performance Monitoring * Log Analytics * Search Analytics
Solr Elasticsearch Support * http://sematext.com/

On Mon, Aug 18, 2014 at 4:09 AM, deniz denizdurmu...@gmail.com wrote:

0 down vote favorite

I am trying to implement an activity feed for a website, and planning to
use
Solr for this case. As it does not have any follower/following relation,
Solr is fitting for the requirements.

There is one point which makes me concerned about performance. So as user
A,
I may have 10K activities in the feed, and then I have updated my
preferences, so the activities that I have posted should be updated too
(imagine that I am changing my user name, so all of the activities would
have my new username). In order to update the all 10K activities, i need to
retrieve the unique document ids from Solr, then update them. Retrieving
10K
docs at once is not a good idea, if you imagine bunch of other users are
also doing a similar change. I have checked docs and forums, using Cursors
on Solr seems ok, but still makes me thing about the performance (after id
retrieval, i need to update each activity)

Are there any other ways to handle this withou Cursors? Or I should better
use another tool/backend to have something like a username - activity_id
mapping, so i can directly retrieve the ids to update?

Regards,

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context:
http://lucene.472066.n3.nabble.com/Retrieving-and-updating-large-set-of-documents-on-Solr-4-7-2-tp4153457.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to search for phrase IAE_UPC_0001

2014-08-18 Thread Paul Rogers

Hi Guys

I've been checking into this further and have deleted the index a couple of
times and rebuilt it with the suggestions you've supplied.

I had a bit of an epiphany last week and decided to check if the document I
was searching for was actually in the index (did this by doing a *.* query
to a file and grep'ing for the 'IAE_UPC_0001@ string).  It seems it isn't!!
Not sure if it was in the original index or not, tho' I suspect not.

As far as I can see anything with the reference in the form IAE_UPC_
has not been indexed while those with the reference in the form
IAE-UPC- has.  Not sure if that's a coincidence or not.

Need to see if I can get the docs into the index and then check if the
search works or not.  Will see if the guys on the Nutch list can shed any
light.

All the best.

P


On 4 August 2014 17:09, Jack Krupansky j...@basetechnology.com wrote:

 The standard tokenizer treats underscore as a valid token character, not a
 delimiter.

 The word delimiter filter will treat underscore as a delimiter though.

 Make sure your query-time WDF does not have preserveOriginal=1 - but the
 index-time WDF should have preserveOriginal=1. Otherwise, the query
 phrase will generate an extra token which will participate in the matching
 and might cause a mismatch.

 -- Jack Krupansky

 -Original Message- From: Paul Rogers
 Sent: Monday, August 4, 2014 5:55 PM

 To: solr-user@lucene.apache.org
 Subject: Re: How to search for phrase IAE_UPC_0001

 Hi Guys

 Thanks for the replies.  I've had a look at the WordDelimiterFilterFactory
 and the Term Info for the url field.  It seems that all the terms exist and
 I now understand that each url is being broken up using the delimiters
 specified.  But I think I'm still missing something.

 Am I correct in assuming the minus sign (-) is also a delimiter?

 If so why then does  url:IAE-UPC-0001 return a result (when the url
 contains the substring IAE-UPC-0001) whereas  url:IAE_UPC_0001 doesn't
 (when the url contains the substring IAE_UPC_0001)?

 Secondly if the url has indeed been broken into the terms IAE UPC and 0001
 why do all the searches suggested or tried succeed when the delimiter is a
 minus sign (-) but not when the delimiter is an underscore (_), returning
 zero matches?

 Finally, shouldn't the query url:IAE UPC 0001~1 work since all it is
 looking for is the three terms?

 Many thanks for any enlightenment.

 P




 On 4 August 2014 01:33, Harald Kirsch harald.kir...@raytion.com wrote:

  This all depends on how the tokenizers take your URLs apart. To quickly
 see what ended up in the index, go to a core in the UI, select Schema
 Browser, select the field containing your URLs, click on Load Term Info.

 In your case, for the field holding the URL you could try to switch to a
 tokenizer that defines tokens as a sequence of alphanumeric characters,
 roughly [a-z0-9]+ plus diacritics. In particular punctuation and
 separation
 characters like dash, underscore, slash, dot and the like would never be
 part of a token, i.e. they don't make a difference.

 Then you can search the url parts with a phrase query (
 https://cwiki.apache.org/confluence/display/solr/The+
 Standard+Query+Parser#TheStandardQueryParser-
 SpecifyingTermsfortheStandardQueryParserwhich) like

  url:IAE-UPC-0001

 In the same way as during indexing, the dashes are removed to end up with
 three tokens, namely IAE, UPC and 0001. Further they have to be in that
 order. Naturally this will then match anything like:

   IAE_UPC_0001
   IAE UPC 0001
   IAE/UPC+0001
   IAE\UPC\0001
   IAE.UPC,0001

 Depending on how your URLs are structured, there is the chance for false
 positives, of course.

 The Really Good Thing here is, that you don't need to use wildcards.

 I have not yet looked at the wildcard-queries implementation in
 Solr/Lucene, but with the  commercial search engines I know, they are a
 great way to loose the confidence of your users, because they just don't
 work as expected by anyone not knowing the implementation. Either they
 deliver only partial results or they kill the performance or they even go
 OOM. If Solr committers have not done something really ingenious,
 Solr/Lucene does have the same problems.

 Harald.






 On 31.07.2014 18:31, Paul Rogers wrote:

  Hi Guys

 I have a Solr application searching on data uploaded by Nutch.  The
 search
 I wish to carry out is for a particular document reference contained
 within
 the url field, e.g. IAE-UPC-0001.

 The problem is is that the file names that comprise the url's are not
 consistent, so a url might contain the reference as IAE-UPC-0001 or
 IAE_UPC_0001 (ie using either the minus or underscore as the delimiter)
 but
 not both.

 I have created the query (in the solr admin interface):

 url:IAE-UPC-0001

 which works (returning the single expected document), as do:

 url:IAE*UPC*0001
 url:IAE?UPC?0001

 when the doc ref is in the format IAE-UPC-0001 (ie using the minus sign
 as
 a delimiter).

 However:

Combining a String Tag with a Numeric Value

2014-08-18 Thread Dave Seltzer

Hello!

I have some new entity data that I'm indexing which takes the form of:

String: EntityString
Float: Confidence

I want to add these to a generic Tags field (for faceting), but I'm not
sure how to hold onto the confidence. Token Payloads seem like one method,
but then I'm not sure how to extract the Payload.

Alternatively I could create two fields: TagIndexed which stores just the
string value and TagStored which contains a delimited String|Float.

What's the right way to do this?

Thanks!

-D

Re: Editing http://wiki.apache.org/solr/PublicServers

2014-08-18 Thread Erick Erickson

Steve:

Sure. What we need to add you to the contributor's group is
your Wiki logon though. Provide us that and we'll
add you ASAP.

Best,
Erick

On Mon, Aug 18, 2014 at 3:14 AM,  ikulc...@precognox.com wrote:
 Hy,

 My name is Istvan Kulcsar and i would like to edit this page:
 http://wiki.apache.org/solr/PublicServers

 Here is some SOLR search:
 http://www.odrportal.hu/kereso/
 http://idea.unideb.hu/idealista/
 http://www.jobmonitor.hu
 http://www.profession.hu/
 http://webicina.com/
 http://www.cylex.hu/
 Én (14.08.13 23:14)
 http://kozbeszerzes.ceu.hu/

 Thanks for help.

 Greets,
 Steve

Re: How to search for phrase IAE_UPC_0001

2014-08-18 Thread Erick Erickson

I'd pull Nutch out of the mix here as a test. Create
some test docs (use the exampleDocs directory?) and
go from there at least long enough to insure that Solr
does what you expect if the data gets there properly.

You can set this up in about 10 minutes, and test it
in about 15 more. May save you endless hours.

Because you're conflating two issues here:
1 whether Nutch is sending the data
2 whether Solr is indexing and searching as you expect.

Some of the Solr/Lucene analysis chains do transformations
that may not be what you assume, particularly things
like StandardTokenizer and WordDelimiterFilterFactory.

So I'd take the time to see that the values you're dealing
with are behaving as you expect. The admin/analysis page
will help you a _lot_ here.

Best,
Erick




On Mon, Aug 18, 2014 at 7:16 AM, Paul Rogers paul.roge...@gmail.com wrote:
 Hi Guys

 I've been checking into this further and have deleted the index a couple of
 times and rebuilt it with the suggestions you've supplied.

 I had a bit of an epiphany last week and decided to check if the document I
 was searching for was actually in the index (did this by doing a *.* query
 to a file and grep'ing for the 'IAE_UPC_0001@ string).  It seems it isn't!!
 Not sure if it was in the original index or not, tho' I suspect not.

 As far as I can see anything with the reference in the form IAE_UPC_
 has not been indexed while those with the reference in the form
 IAE-UPC- has.  Not sure if that's a coincidence or not.

 Need to see if I can get the docs into the index and then check if the
 search works or not.  Will see if the guys on the Nutch list can shed any
 light.

 All the best.

 P


 On 4 August 2014 17:09, Jack Krupansky j...@basetechnology.com wrote:

 The standard tokenizer treats underscore as a valid token character, not a
 delimiter.

 The word delimiter filter will treat underscore as a delimiter though.

 Make sure your query-time WDF does not have preserveOriginal=1 - but the
 index-time WDF should have preserveOriginal=1. Otherwise, the query
 phrase will generate an extra token which will participate in the matching
 and might cause a mismatch.

 -- Jack Krupansky

 -Original Message- From: Paul Rogers
 Sent: Monday, August 4, 2014 5:55 PM

 To: solr-user@lucene.apache.org
 Subject: Re: How to search for phrase IAE_UPC_0001

 Hi Guys

 Thanks for the replies.  I've had a look at the WordDelimiterFilterFactory
 and the Term Info for the url field.  It seems that all the terms exist and
 I now understand that each url is being broken up using the delimiters
 specified.  But I think I'm still missing something.

 Am I correct in assuming the minus sign (-) is also a delimiter?

 If so why then does  url:IAE-UPC-0001 return a result (when the url
 contains the substring IAE-UPC-0001) whereas  url:IAE_UPC_0001 doesn't
 (when the url contains the substring IAE_UPC_0001)?

 Secondly if the url has indeed been broken into the terms IAE UPC and 0001
 why do all the searches suggested or tried succeed when the delimiter is a
 minus sign (-) but not when the delimiter is an underscore (_), returning
 zero matches?

 Finally, shouldn't the query url:IAE UPC 0001~1 work since all it is
 looking for is the three terms?

 Many thanks for any enlightenment.

 P




 On 4 August 2014 01:33, Harald Kirsch harald.kir...@raytion.com wrote:

  This all depends on how the tokenizers take your URLs apart. To quickly
 see what ended up in the index, go to a core in the UI, select Schema
 Browser, select the field containing your URLs, click on Load Term Info.

 In your case, for the field holding the URL you could try to switch to a
 tokenizer that defines tokens as a sequence of alphanumeric characters,
 roughly [a-z0-9]+ plus diacritics. In particular punctuation and
 separation
 characters like dash, underscore, slash, dot and the like would never be
 part of a token, i.e. they don't make a difference.

 Then you can search the url parts with a phrase query (
 https://cwiki.apache.org/confluence/display/solr/The+
 Standard+Query+Parser#TheStandardQueryParser-
 SpecifyingTermsfortheStandardQueryParserwhich) like

  url:IAE-UPC-0001

 In the same way as during indexing, the dashes are removed to end up with
 three tokens, namely IAE, UPC and 0001. Further they have to be in that
 order. Naturally this will then match anything like:

   IAE_UPC_0001
   IAE UPC 0001
   IAE/UPC+0001
   IAE\UPC\0001
   IAE.UPC,0001

 Depending on how your URLs are structured, there is the chance for false
 positives, of course.

 The Really Good Thing here is, that you don't need to use wildcards.

 I have not yet looked at the wildcard-queries implementation in
 Solr/Lucene, but with the  commercial search engines I know, they are a
 great way to loose the confidence of your users, because they just don't
 work as expected by anyone not knowing the implementation. Either they
 deliver only partial results or they kill the performance or they even go
 OOM. If

Re: Combining a String Tag with a Numeric Value

2014-08-18 Thread Erick Erickson

Hmmm, there's no particular right way. It'd be simpler
to index these as two separate fields _if_ there's only
one pair per document. If there are more and you index them
as two mutliValued fields, there's no good way at _query_ time
to retain the association. The returned multiValued fields are
guaranteed to be in the same order of insertion so you can
display the correct pairs, but you can't use the association
to score docs. Hmmm, somewhat abstract. OK say you want to
associate two tag/value pairs, tag1:50 and tag2:100. Say further
that you have two multiValued fields, Tags and Values and then
index tag1 and tag2 into Tags and 50 and 100 into Values.
There's no good way to express q=tags:tag1 and factor the
associated value of 50 into the score

Note that the returned _values_ will be
Tags:   tag1 tag2
Values  50  100

So at that point you can see the associations.

that said, if there's only _one_ such tag/value pair per document,
it's easy to write a FunctionQuery (http://wiki.apache.org/solr/FunctionQuery)
that does this.

***

If you have many tag/value pairs, payloads are probably what you want.
Here's an end-to-end example:

http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/

Best,
Erick

On Mon, Aug 18, 2014 at 7:32 AM, Dave Seltzer dselt...@tveyes.com wrote:
 Hello!

 I have some new entity data that I'm indexing which takes the form of:

 String: EntityString
 Float: Confidence

 I want to add these to a generic Tags field (for faceting), but I'm not
 sure how to hold onto the confidence. Token Payloads seem like one method,
 but then I'm not sure how to extract the Payload.

 Alternatively I could create two fields: TagIndexed which stores just the
 string value and TagStored which contains a delimited String|Float.

 What's the right way to do this?

 Thanks!

 -D

need help in field collapsing

2014-08-18 Thread Sankalp Gupta

Hi

I have about 15 fields in my solr schema but there are two fields lets say
them field1 and field2 in my schema. For most searches I feel I have a
perfect schema but for one use case it is not apt:
*problem*: I have to group by column using field1 and then I have to search
a particular value a in field1 only when b is not present in any
instance of field2 of this respective group. (Same as using having after
group by in mysql). Is there a way to do this in Solr or do I have to
maintain a separate schema for this(which will be a very costly operation
for us).

Thanks in advance

solr cloud going down repeatedly

2014-08-18 Thread Jakov Sosic


Hi guys.

I have a solr cloud, consisting of 3 zookeper VMs running 3.4.5 
backported from Ubuntu 14.04 LTS to 12.04 LTS.


They are orchestrating 4 solr nodes, which have 2 cores. Each core is 
sharded, so 1 shard is on each of the solr nodes.


Solr runs under tomcat7 and ubuntus latest openjdk 7.

Version of solr is 4.2.1.

Each of the nodes have around 7GB of data, and JVM is set to run 8GB 
heap. All solr nodes have 16GB RAM.



Few weeks back we started having issues with this installation. Tomcat 
was filling up catalina.out with following messages:


SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:


Only solution was to restart all 4 tomcats on 4 solr nodes. After that, 
issue would rectify itself, but would occur again, approximately a week 
after a restart.


This happened last time yesterday, and I succeded in recording some of 
the stuff happening on boxes via Zabbix and atop.



Basically at 15:35 load on machine went berzerk, jumping from around 0.5 
to around 30+


Zabbix and atop didn't notice any heavy IO, all the other processes were 
practicaly idle, only JVM (tomcat) exploded with cpu usage increasing 
from standard ~80% to around ~750%


These are the parts of Atop recordings on one of the node. Note that 
they are 10 mins appart:


(15:28:42)
CPL | avg10.12  |   | avg50.36  | avg15   0.38  |

(15:38:42)
CPL | avg18.54  |   | avg53.62  | avg15   1.61  |

(15:48:42)
CPL | avg1   30.14  |   | avg5   27.09  | avg15  14.73  |



This is the status of tomcat at last point (15:48:42):
28891tomcat7 tomcat7  411  8.68s  70m14s 
   209.9M  204K0K 5804K --  - 
  S5704%java



I have noticed similar stuff happening around the solr nodes. At 17:41 
on call person decided to hard reset all the solr nodes, and cloud came 
back up running normally after that.


These are the logs that I found on first node:

Aug 17, 2014 3:44:58 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:

Aug 17, 2014 3:46:12 PM 
org.apache.solr.cloud.OverseerCollectionProcessor run

WARNING: Overseer cannot talk to ZK
Aug 17, 2014 3:46:12 PM 
org.apache.solr.cloud.Overseer$ClusterStateUpdater amILeader

WARNING:
org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /overseer_elect/leader


Then a bunch of :

Aug 17, 2014 3:46:42 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:

until the server was rebooted.


On other nodes I can see:
node2:

Aug 17, 2014 3:44:58 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for 
zkNodeName=10.100.254.103:8080_solr_myappcore=myapp

Aug 17, 2014 3:44:58 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for 
zkNodeName=10.100.254.103:8080_solr_myapp2core=myapp2

Aug 17, 2014 3:46:24 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: 
org.apache.solr.client.solrj.SolrServerException: IOException occured 
when talking to server at: http://node1:8080/solr/myapp


node4:

Aug 17, 2014 3:44:06 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for 
zkNodeName=10.100.254.105:8080_solr_myapp2core=myapp2

Aug 17, 2014 3:44:09 PM org.apache.solr.cloud.RecoveryStrategy close
WARNING: Stopping recovery for 
zkNodeName=10.100.254.105:8080_solr_myappcore=myapp

Aug 17, 2014 3:45:37 PM org.apache.solr.common.SolrException log
SEVERE: There was a problem finding the leader in 
zk:org.apache.solr.common.SolrException: Could not get leader props





My impression is that garbage collector is at fault here.

This is the cmdline of tomcat:

/usr/lib/jvm/java-7-openjdk-amd64/bin/java 
-Djava.util.logging.config.file=/var/lib/tomcat7/conf/logging.properties 
-Djava.awt.headless=true -Xmx8192m -XX:+UseConcMarkSweepGC -DnumShards=2 
-Djetty.port=8080 
-DzkHost=10.215.1.96:2181,10.215.1.97:2181,10.215.1.98:2181 
-javaagent:/opt/newrelic/newrelic.jar -Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=9010 
-Dcom.sun.management.jmxremote.local.only=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djav 
.endorsed.dirs=/usr/share/tomcat7/endorsed -classpath 
/usr/share/tomcat7/bin/bootstrap.jar:/usr/share/tomcat7/bin/tomcat-juli.jar 
-Dcatalina.base=/var/lib/tomcat7 -Dcatalina.home=/usr/share/tomcat7 
-Djava.io.tmpdir=/tmp/tomcat7-tomcat7-tmp 
org.apache.catalina.startup.Bootstrap start



So, I am using MarkSweepGC.

Do you have any suggestion how can I debug this further and potentially 
eliminate the issue causing downtimes?

Re: How to restore an index from a backup over HTTP

2014-08-18 Thread Jeff Wartes


I¹m able to do cross-solrcloud-cluster index copy using nothing more than
careful use of the ³fetchindex² replication handler command.

I¹m using this as a build/deployment tool, so I manually create a
collection in two clusters, index into one, test, and then ask the other
cluster to fetchindex from it on each shard/replica.

Some caveats:
  1. It seems like fetchindex may silently decline if it thinks the index
it has is newer.
  2. I¹m not doing this on an index that¹s currently receiving updates.
  3. SolrCloud replication doesn¹t come into this flow, even if you
fetchindex on a leader. (although once you¹re done, updates should get
replicated normally)
  4. Both collections must be created with the same number of shards and
sharding mechanism. (although replication factor can vary)
 

I¹ve got a tool for automating this that I¹d like to push to github at
some point, let me know if you¹re interested.





On 8/16/14, 3:03 AM, Greg Solovyev g...@zimbra.com wrote:

Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty
straight forward, but the main concern I have is the internal data format
that ReplicationHandler and SnapPuller use. This new handler as well as
the code that I've already written to download the index files from Solr
will depend on that format. Unfortunately, this format is not documented
and is not abstracted by SolrJ, so I wonder what I can do to make sure it
does not change on us without notice.

Thanks,
Greg

- Original Message -
From: Shawn Heisey s...@elyograg.org
To: solr-user@lucene.apache.org
Sent: Friday, August 15, 2014 7:31:19 PM
Subject: Re: How to restore an index from a backup over HTTP

On 8/15/2014 5:51 AM, Greg Solovyev wrote:
 What I want to achieve is being able to send the backed up index to
Solr (either standalone or with ZooKeeper) in a way similar to creating
a new Collection. I.e. create a new collection and upload an exiting
index directly into that Collection. I've looked through Solr code and
so far I have not found a handler that would allow this scenario. So,
the last idea is to implement a special handler for this case, perhaps
extending CoreAdminHandler. ReplicationHandler together with SnapPuller
do pretty much what I need to do, except that the action has to be
initiated by the receiving Solr server and I need to initiate the action
externally. I.e., instead of having Solr slave download an index from
Solr master, I need to feed the index to Solr master and ideally this
would work the same way in standalone and SolrCloud modes.

I have not made any attempt to verify what I'm stating below.  It may
not work.

What I think I would *try* is setting up a standalone Solr (no cloud) on
the backup server.  Use scripted index/config copies and Solr start/stop
actions to get the index up and running on a known core in the
standalone Solr.  Then use the replication handler's HTTP API to
replicate the index from that standalone server to each of the replicas
in your cluster.

https://wiki.apache.org/solr/SolrReplication#HTTP_API
https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexRe
plication-HTTPAPICommandsfortheReplicationHandler

One thing that I do not know is whether SolrCloud itself might interfere
with these actions, or whether it might automatically take care of
additional replicas if you replicate to the shard leader.  If SolrCloud
*would* interfere, then this idea might need special support in
SolrCloud, perhaps as an extension to the Collections API.  If it won't
interfere, then the use-case would need to be documented (on the user
wiki at a minimum) so that committers will be aware of it and preserve
the capability in future versions.  An extension to the Collections API
might be a good idea either way -- I've seen a number of questions about
capability that falls under this basic heading.

Thanks,
Shawn

Re: How to restore an index from a backup over HTTP

2014-08-18 Thread Greg Solovyev

Thanks Jeff, I'd be interested in taking a look at the code for this tool. My 
github ID is grishick.

Thanks,
Greg

- Original Message -
From: Jeff Wartes jwar...@whitepages.com
To: solr-user@lucene.apache.org
Sent: Monday, August 18, 2014 9:49:28 PM
Subject: Re: How to restore an index from a backup over HTTP

I¹m able to do cross-solrcloud-cluster index copy using nothing more than
careful use of the ³fetchindex² replication handler command.

I¹m using this as a build/deployment tool, so I manually create a
collection in two clusters, index into one, test, and then ask the other
cluster to fetchindex from it on each shard/replica.

Some caveats:
  1. It seems like fetchindex may silently decline if it thinks the index
it has is newer.
  2. I¹m not doing this on an index that¹s currently receiving updates.
  3. SolrCloud replication doesn¹t come into this flow, even if you
fetchindex on a leader. (although once you¹re done, updates should get
replicated normally)
  4. Both collections must be created with the same number of shards and
sharding mechanism. (although replication factor can vary)

I¹ve got a tool for automating this that I¹d like to push to github at
some point, let me know if you¹re interested.

On 8/16/14, 3:03 AM, Greg Solovyev g...@zimbra.com wrote:

Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty
straight forward, but the main concern I have is the internal data format
that ReplicationHandler and SnapPuller use. This new handler as well as
the code that I've already written to download the index files from Solr
will depend on that format. Unfortunately, this format is not documented
and is not abstracted by SolrJ, so I wonder what I can do to make sure it
does not change on us without notice.

Thanks,
Greg

- Original Message -
From: Shawn Heisey s...@elyograg.org
To: solr-user@lucene.apache.org
Sent: Friday, August 15, 2014 7:31:19 PM
Subject: Re: How to restore an index from a backup over HTTP

On 8/15/2014 5:51 AM, Greg Solovyev wrote:
 What I want to achieve is being able to send the backed up index to
Solr (either standalone or with ZooKeeper) in a way similar to creating
a new Collection. I.e. create a new collection and upload an exiting
index directly into that Collection. I've looked through Solr code and
so far I have not found a handler that would allow this scenario. So,
the last idea is to implement a special handler for this case, perhaps
extending CoreAdminHandler. ReplicationHandler together with SnapPuller
do pretty much what I need to do, except that the action has to be
initiated by the receiving Solr server and I need to initiate the action
externally. I.e., instead of having Solr slave download an index from
Solr master, I need to feed the index to Solr master and ideally this
would work the same way in standalone and SolrCloud modes.

I have not made any attempt to verify what I'm stating below.  It may
not work.

What I think I would *try* is setting up a standalone Solr (no cloud) on
the backup server.  Use scripted index/config copies and Solr start/stop
actions to get the index up and running on a known core in the
standalone Solr.  Then use the replication handler's HTTP API to
replicate the index from that standalone server to each of the replicas
in your cluster.

https://wiki.apache.org/solr/SolrReplication#HTTP_API
https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexRe
plication-HTTPAPICommandsfortheReplicationHandler

One thing that I do not know is whether SolrCloud itself might interfere
with these actions, or whether it might automatically take care of
additional replicas if you replicate to the shard leader.  If SolrCloud
*would* interfere, then this idea might need special support in
SolrCloud, perhaps as an extension to the Collections API.  If it won't
interfere, then the use-case would need to be documented (on the user
wiki at a minimum) so that committers will be aware of it and preserve
the capability in future versions.  An extension to the Collections API
might be a good idea either way -- I've seen a number of questions about
capability that falls under this basic heading.

Thanks,
Shawn

Re: How to restore an index from a backup over HTTP

2014-08-18 Thread Greg Solovyev

Shawn, the format that I am referencing is filestream, which starts with 2 
bytes carrying file size, then 4 bytes carrying checksum (optional) and then 
the actual bits of the file.

Thanks,
Greg

- Original Message -
From: Shawn Heisey s...@elyograg.org
To: solr-user@lucene.apache.org
Sent: Sunday, August 17, 2014 12:28:12 AM
Subject: Re: How to restore an index from a backup over HTTP

On 8/16/2014 4:03 AM, Greg Solovyev wrote:
 Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty 
 straight forward, but the main concern I have is the internal data format 
 that ReplicationHandler and SnapPuller use. This new handler as well as the 
 code that I've already written to download the index files from Solr will 
 depend on that format. Unfortunately, this format is not documented and is 
 not abstracted by SolrJ, so I wonder what I can do to make sure it does not 
 change on us without notice.

I am not really sure what format you're referencing here, but I'm about
99% sure the format *over the wire* is javabin.  When the javabin format
changed between 1.4.1 and 3.1.0, replication between those versions
became impossible.

Historical: The Solr version made a huge leap after the Solr and Lucene
development was merged -- it was synchronized with the Lucene version.
There are no 1.5, 2.x, or 3.0 versions of Solr.

https://issues.apache.org/jira/browse/SOLR-2204

Thanks,
Shawn

Re: solr cloud going down repeatedly

2014-08-18 Thread Shawn Heisey

On 8/18/2014 11:30 AM, Jakov Sosic wrote:
 My impression is that garbage collector is at fault here.

 This is the cmdline of tomcat:

 /usr/lib/jvm/java-7-openjdk-amd64/bin/java
 -Djava.util.logging.config.file=/var/lib/tomcat7/conf/logging.properties
 -Djava.awt.headless=true -Xmx8192m -XX:+UseConcMarkSweepGC
 -DnumShards=2 -Djetty.port=8080
 -DzkHost=10.215.1.96:2181,10.215.1.97:2181,10.215.1.98:2181
 -javaagent:/opt/newrelic/newrelic.jar -Dcom.sun.management.jmxremote
 -Dcom.sun.management.jmxremote.port=9010
 -Dcom.sun.management.jmxremote.local.only=false
 -Dcom.sun.management.jmxremote.authenticate=false
 -Dcom.sun.management.jmxremote.ssl=false
 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
 -Djav .endorsed.dirs=/usr/share/tomcat7/endorsed -classpath
 /usr/share/tomcat7/bin/bootstrap.jar:/usr/share/tomcat7/bin/tomcat-juli.jar
 -Dcatalina.base=/var/lib/tomcat7 -Dcatalina.home=/usr/share/tomcat7
 -Djava.io.tmpdir=/tmp/tomcat7-tomcat7-tmp
 org.apache.catalina.startup.Bootstrap start

With an 8GB heap and UseConcMarkSweepGC as your only GC tuning, I can
pretty much guarantee that you'll see occasional GC pauses of 10-15
seconds, because I saw exactly that happening with my own setup.

This is what I use now:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

I can't claim that my problem is 100% solved, but collections that go
over one second are *very* rare now, and I'm pretty sure they are all
under two seconds.

Thanks,
Shawn

Need details on this query

2014-08-18 Thread bbarani

Hi,

This might be a silly question..

I came across the below query online but I couldn't really understand the
bolded part. Can someone help me understanding this part of the query?

deviceType_:Cell OR deviceType_:Prepaid *OR (phone
-data_source_name:(Catalog OR Device How To - Interactive OR Device How
To - StepByStep))*

Thanks,
Barani



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-details-on-this-query-tp4153606.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to generate stats based on time segments?

2014-08-18 Thread abhayd

hi 
I have dataset in solr like

   id|time|price|
   1|t0|100|
   1|t1|10|
   1|t2|20|
   1|t3|30|

What i want is when i query solr for time  t0 I want to return data like
t1, 100
rest,60 ( which is sum of price for t1,t2,t3)

Is that something can be done?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-generate-stats-based-on-time-segments-tp4153607.html
Sent from the Solr - User mailing list archive at Nabble.com.

faceted query with stats not working in solrj

2014-08-18 Thread tedsolr

Hi. I have a query that works just fine in the browser. It rolls up documents
by the facet field and gives me stats on the stats field:

http://localhost:8983/solr/corename/select?q=*:*stats=onstats.field=Spendstats.facet=Supplier

Posting this works just fine. However I cannot get stats from SolrJ or the
solr admin console. From the admin console (on the Query tab) I see:
str name=msgcan not use FieldCache on a field which is neither indexed
nor has doc values: Supplier?wt=xml/str

Both Spend and Supplier are indexed. The error must be referring to
something else.

In Java, I use 
query.addStatsFieldFacets(Spend, Supplier); 
but the stats object comes back null.
response.getFieldStatsInfo() == null

Thanks so much for any suggestions.
using solr 4.9



--
View this message in context: 
http://lucene.472066.n3.nabble.com/faceted-query-with-stats-not-working-in-solrj-tp4153608.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: faceted query with stats not working in solrj

2014-08-18 Thread Shawn Heisey

On 8/18/2014 12:47 PM, tedsolr wrote:
 Hi. I have a query that works just fine in the browser. It rolls up documents
 by the facet field and gives me stats on the stats field:

 http://localhost:8983/solr/corename/select?q=*:*stats=onstats.field=Spendstats.facet=Supplier

 Posting this works just fine. However I cannot get stats from SolrJ or the
 solr admin console. From the admin console (on the Query tab) I see:
 str name=msgcan not use FieldCache on a field which is neither indexed
 nor has doc values: Supplier?wt=xml/str

 Both Spend and Supplier are indexed. The error must be referring to
 something else.

 In Java, I use 
 query.addStatsFieldFacets(Spend, Supplier); 
 but the stats object comes back null.
 response.getFieldStatsInfo() == null

I won't claim to know how the stats stuff works, but one thing to do is
make sure Solr is logging at the INFO level or finer, then look at the
Solr log to see what the differences are in the actual query that Solr
is receiving when you do it in the browser and when you do it with
SolrJ.  You will need to look at the actual log file, not the logging
tab in the admin UI.  When using the example included in the Solr
download, the logfile is at logs/solr.log.   If you're using another
method for starting Solr, that may be different.

Thanks,
Shawn

Currency field type not supported for stats

2014-08-18 Thread tedsolr

Just looking for confirmation that the currency field is not supported for
stats. When I use a currency field as the stats.field I get his error:

http://localhost:8983/solr/corename/select?q=*:*stats=onstats.field=SpendAsCurrencystats.facet=Supplier

Field type
currency{class=org.apache.solr.schema.CurrencyField,analyzer=org.apache.solr.schema.FieldType$DefaultAnalyzer,args={precisionStep=8,
multiValued=false, currencyConfig=currency.xml, defaultCurrency=USD,
class=solr.CurrencyField}} is not currently supported

When I run stats on a long type it works fine. I can of course work around
this by modifying my schema. So is currency not a numeric type in solr?

thanks 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Currency-field-type-not-supported-for-stats-tp4153610.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to search for phrase IAE_UPC_0001

2014-08-18 Thread Paul Rogers

Hi Erick

Thanks for the assist.  Did as you suggested (tho' I used Nutch).  Cleared
out solr's index and Nutch's crawl DB and then emptied all the documents
out of the web server bar 10 of each type (IAE-UPC- and IAE_UPC_).
 Then crawled the site using Nutch.

Then confirmed that all 20 docs had been uploaded and that *.* search
returned all 20 docs.

Now when I do a url search on either (for example) q=url:IAE-UPC-220 or
q=IAE_UPC_0001 I get a result returned for each as expected, ie it now
works as expected.

So seems I now need to figure out why Nutch isn't crawling the documents.

Again many thanks.

P




On 18 August 2014 11:22, Erick Erickson erickerick...@gmail.com wrote:

 I'd pull Nutch out of the mix here as a test. Create
 some test docs (use the exampleDocs directory?) and
 go from there at least long enough to insure that Solr
 does what you expect if the data gets there properly.

 You can set this up in about 10 minutes, and test it
 in about 15 more. May save you endless hours.

 Because you're conflating two issues here:
 1 whether Nutch is sending the data
 2 whether Solr is indexing and searching as you expect.

 Some of the Solr/Lucene analysis chains do transformations
 that may not be what you assume, particularly things
 like StandardTokenizer and WordDelimiterFilterFactory.

 So I'd take the time to see that the values you're dealing
 with are behaving as you expect. The admin/analysis page
 will help you a _lot_ here.

 Best,
 Erick




 On Mon, Aug 18, 2014 at 7:16 AM, Paul Rogers paul.roge...@gmail.com
 wrote:
  Hi Guys
 
  I've been checking into this further and have deleted the index a couple
 of
  times and rebuilt it with the suggestions you've supplied.
 
  I had a bit of an epiphany last week and decided to check if the
 document I
  was searching for was actually in the index (did this by doing a *.*
 query
  to a file and grep'ing for the 'IAE_UPC_0001@ string).  It seems it
 isn't!!
  Not sure if it was in the original index or not, tho' I suspect not.
 
  As far as I can see anything with the reference in the form IAE_UPC_
  has not been indexed while those with the reference in the form
  IAE-UPC- has.  Not sure if that's a coincidence or not.
 
  Need to see if I can get the docs into the index and then check if the
  search works or not.  Will see if the guys on the Nutch list can shed any
  light.
 
  All the best.
 
  P
 
 
  On 4 August 2014 17:09, Jack Krupansky j...@basetechnology.com wrote:
 
  The standard tokenizer treats underscore as a valid token character,
 not a
  delimiter.
 
  The word delimiter filter will treat underscore as a delimiter though.
 
  Make sure your query-time WDF does not have preserveOriginal=1 - but
 the
  index-time WDF should have preserveOriginal=1. Otherwise, the query
  phrase will generate an extra token which will participate in the
 matching
  and might cause a mismatch.
 
  -- Jack Krupansky
 
  -Original Message- From: Paul Rogers
  Sent: Monday, August 4, 2014 5:55 PM
 
  To: solr-user@lucene.apache.org
  Subject: Re: How to search for phrase IAE_UPC_0001
 
  Hi Guys
 
  Thanks for the replies.  I've had a look at the
 WordDelimiterFilterFactory
  and the Term Info for the url field.  It seems that all the terms exist
 and
  I now understand that each url is being broken up using the delimiters
  specified.  But I think I'm still missing something.
 
  Am I correct in assuming the minus sign (-) is also a delimiter?
 
  If so why then does  url:IAE-UPC-0001 return a result (when the url
  contains the substring IAE-UPC-0001) whereas  url:IAE_UPC_0001 doesn't
  (when the url contains the substring IAE_UPC_0001)?
 
  Secondly if the url has indeed been broken into the terms IAE UPC and
 0001
  why do all the searches suggested or tried succeed when the delimiter
 is a
  minus sign (-) but not when the delimiter is an underscore (_),
 returning
  zero matches?
 
  Finally, shouldn't the query url:IAE UPC 0001~1 work since all it is
  looking for is the three terms?
 
  Many thanks for any enlightenment.
 
  P
 
 
 
 
  On 4 August 2014 01:33, Harald Kirsch harald.kir...@raytion.com
 wrote:
 
   This all depends on how the tokenizers take your URLs apart. To quickly
  see what ended up in the index, go to a core in the UI, select Schema
  Browser, select the field containing your URLs, click on Load Term
 Info.
 
  In your case, for the field holding the URL you could try to switch to
 a
  tokenizer that defines tokens as a sequence of alphanumeric characters,
  roughly [a-z0-9]+ plus diacritics. In particular punctuation and
  separation
  characters like dash, underscore, slash, dot and the like would never
 be
  part of a token, i.e. they don't make a difference.
 
  Then you can search the url parts with a phrase query (
  https://cwiki.apache.org/confluence/display/solr/The+
  Standard+Query+Parser#TheStandardQueryParser-
  SpecifyingTermsfortheStandardQueryParserwhich) like

logging in solr

2014-08-18 Thread M, Arjun (NSN - IN/Bangalore)

Hi,

Currently in my component Solr is logging to catalina.out. What is the 
configuration needed to redirect those logs to some custom logfile eg: Solr.log.

Thanks...

--Arjun

Re: Combining a String Tag with a Numeric Value

2014-08-18 Thread Dave Seltzer

Thanks Erick,

I'm not sure I need to score the documents based on the numeric value, but
I am interested in being able to calculate the average (Mean) of all the
numeric values for a given tag. For example, what is the average confidence
of Tag1 across all documents.

I'm not sure I can do that without building a FunctionQuery.

-Dave


On Mon, Aug 18, 2014 at 12:46 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Hmmm, there's no particular right way. It'd be simpler
 to index these as two separate fields _if_ there's only
 one pair per document. If there are more and you index them
 as two mutliValued fields, there's no good way at _query_ time
 to retain the association. The returned multiValued fields are
 guaranteed to be in the same order of insertion so you can
 display the correct pairs, but you can't use the association
 to score docs. Hmmm, somewhat abstract. OK say you want to
 associate two tag/value pairs, tag1:50 and tag2:100. Say further
 that you have two multiValued fields, Tags and Values and then
 index tag1 and tag2 into Tags and 50 and 100 into Values.
 There's no good way to express q=tags:tag1 and factor the
 associated value of 50 into the score

 Note that the returned _values_ will be
 Tags:   tag1 tag2
 Values  50  100

 So at that point you can see the associations.

 that said, if there's only _one_ such tag/value pair per document,
 it's easy to write a FunctionQuery (
 http://wiki.apache.org/solr/FunctionQuery)
 that does this.

 ***

 If you have many tag/value pairs, payloads are probably what you want.
 Here's an end-to-end example:

 http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/

 Best,
 Erick

 On Mon, Aug 18, 2014 at 7:32 AM, Dave Seltzer dselt...@tveyes.com wrote:
  Hello!
 
  I have some new entity data that I'm indexing which takes the form of:
 
  String: EntityString
  Float: Confidence
 
  I want to add these to a generic Tags field (for faceting), but I'm not
  sure how to hold onto the confidence. Token Payloads seem like one
 method,
  but then I'm not sure how to extract the Payload.
 
  Alternatively I could create two fields: TagIndexed which stores just the
  string value and TagStored which contains a delimited String|Float.
 
  What's the right way to do this?
 
  Thanks!
 
  -D

Re: logging in solr

2014-08-18 Thread Aurélien MAZOYER


Hi,

Are you using tomcat or jetty? If you use the default jetty, have a look 
to : http://wiki.apache.org/solr/LoggingInDefaultJettySetup


Regards,

Aurélien


Le 18/08/2014 22:43, M, Arjun (NSN - IN/Bangalore) a écrit :

Hi,

 Currently in my component Solr is logging to catalina.out. What is the 
configuration needed to redirect those logs to some custom logfile eg: Solr.log.

 Thanks...

--Arjun

Re: logging in solr

2014-08-18 Thread Aurélien MAZOYER

Sorry, outdated link. And I suppose you use tomcat if you are talking 
about catalina.out The correct link is : 
http://wiki.apache.org/solr/SolrLogging#Solr_4.3_and_above



Le 18/08/2014 23:06, Aurélien MAZOYER a écrit :


Hi,

Are you using tomcat or jetty? If you use the default jetty, have a 
look to : http://wiki.apache.org/solr/LoggingInDefaultJettySetup


Regards,

Aurélien


Le 18/08/2014 22:43, M, Arjun (NSN - IN/Bangalore) a écrit :

Hi,

 Currently in my component Solr is logging to catalina.out. 
What is the configuration needed to redirect those logs to some 
custom logfile eg: Solr.log.


 Thanks...

--Arjun

Re: Need details on this query

2014-08-18 Thread Erick Erickson

OR (phone
-data_source_name:(Catalog OR Device How To - Interactive OR Device How
To - StepByStep))

Just an OR clause that searches for all documents that have phone ( in the
default search field or multiple fields if its an edismax parser). Remove from
that set any documents with a data_source_name that contains any of the
three phrases:
Catalog
Device How To - Interactive
Device How To - StepByStep

and return all those documents in the query

HTH,
ErIck

On Mon, Aug 18, 2014 at 11:42 AM, bbarani bbar...@gmail.com wrote:
 Hi,

 This might be a silly question..

 I came across the below query online but I couldn't really understand the
 bolded part. Can someone help me understanding this part of the query?

 deviceType_:Cell OR deviceType_:Prepaid *OR (phone
 -data_source_name:(Catalog OR Device How To - Interactive OR Device How
 To - StepByStep))*

 Thanks,
 Barani



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Need-details-on-this-query-tp4153606.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to search for phrase IAE_UPC_0001

2014-08-18 Thread Erick Erickson

NP, glad you're making forward progress!

Erick

On Mon, Aug 18, 2014 at 12:31 PM, Paul Rogers paul.roge...@gmail.com wrote:
 Hi Erick

 Thanks for the assist.  Did as you suggested (tho' I used Nutch).  Cleared
 out solr's index and Nutch's crawl DB and then emptied all the documents
 out of the web server bar 10 of each type (IAE-UPC- and IAE_UPC_).
  Then crawled the site using Nutch.

 Then confirmed that all 20 docs had been uploaded and that *.* search
 returned all 20 docs.

 Now when I do a url search on either (for example) q=url:IAE-UPC-220 or
 q=IAE_UPC_0001 I get a result returned for each as expected, ie it now
 works as expected.

 So seems I now need to figure out why Nutch isn't crawling the documents.

 Again many thanks.

 P




 On 18 August 2014 11:22, Erick Erickson erickerick...@gmail.com wrote:

 I'd pull Nutch out of the mix here as a test. Create
 some test docs (use the exampleDocs directory?) and
 go from there at least long enough to insure that Solr
 does what you expect if the data gets there properly.

 You can set this up in about 10 minutes, and test it
 in about 15 more. May save you endless hours.

 Because you're conflating two issues here:
 1 whether Nutch is sending the data
 2 whether Solr is indexing and searching as you expect.

 Some of the Solr/Lucene analysis chains do transformations
 that may not be what you assume, particularly things
 like StandardTokenizer and WordDelimiterFilterFactory.

 So I'd take the time to see that the values you're dealing
 with are behaving as you expect. The admin/analysis page
 will help you a _lot_ here.

 Best,
 Erick




 On Mon, Aug 18, 2014 at 7:16 AM, Paul Rogers paul.roge...@gmail.com
 wrote:
  Hi Guys
 
  I've been checking into this further and have deleted the index a couple
 of
  times and rebuilt it with the suggestions you've supplied.
 
  I had a bit of an epiphany last week and decided to check if the
 document I
  was searching for was actually in the index (did this by doing a *.*
 query
  to a file and grep'ing for the 'IAE_UPC_0001@ string).  It seems it
 isn't!!
  Not sure if it was in the original index or not, tho' I suspect not.
 
  As far as I can see anything with the reference in the form IAE_UPC_
  has not been indexed while those with the reference in the form
  IAE-UPC- has.  Not sure if that's a coincidence or not.
 
  Need to see if I can get the docs into the index and then check if the
  search works or not.  Will see if the guys on the Nutch list can shed any
  light.
 
  All the best.
 
  P
 
 
  On 4 August 2014 17:09, Jack Krupansky j...@basetechnology.com wrote:
 
  The standard tokenizer treats underscore as a valid token character,
 not a
  delimiter.
 
  The word delimiter filter will treat underscore as a delimiter though.
 
  Make sure your query-time WDF does not have preserveOriginal=1 - but
 the
  index-time WDF should have preserveOriginal=1. Otherwise, the query
  phrase will generate an extra token which will participate in the
 matching
  and might cause a mismatch.
 
  -- Jack Krupansky
 
  -Original Message- From: Paul Rogers
  Sent: Monday, August 4, 2014 5:55 PM
 
  To: solr-user@lucene.apache.org
  Subject: Re: How to search for phrase IAE_UPC_0001
 
  Hi Guys
 
  Thanks for the replies.  I've had a look at the
 WordDelimiterFilterFactory
  and the Term Info for the url field.  It seems that all the terms exist
 and
  I now understand that each url is being broken up using the delimiters
  specified.  But I think I'm still missing something.
 
  Am I correct in assuming the minus sign (-) is also a delimiter?
 
  If so why then does  url:IAE-UPC-0001 return a result (when the url
  contains the substring IAE-UPC-0001) whereas  url:IAE_UPC_0001 doesn't
  (when the url contains the substring IAE_UPC_0001)?
 
  Secondly if the url has indeed been broken into the terms IAE UPC and
 0001
  why do all the searches suggested or tried succeed when the delimiter
 is a
  minus sign (-) but not when the delimiter is an underscore (_),
 returning
  zero matches?
 
  Finally, shouldn't the query url:IAE UPC 0001~1 work since all it is
  looking for is the three terms?
 
  Many thanks for any enlightenment.
 
  P
 
 
 
 
  On 4 August 2014 01:33, Harald Kirsch harald.kir...@raytion.com
 wrote:
 
   This all depends on how the tokenizers take your URLs apart. To quickly
  see what ended up in the index, go to a core in the UI, select Schema
  Browser, select the field containing your URLs, click on Load Term
 Info.
 
  In your case, for the field holding the URL you could try to switch to
 a
  tokenizer that defines tokens as a sequence of alphanumeric characters,
  roughly [a-z0-9]+ plus diacritics. In particular punctuation and
  separation
  characters like dash, underscore, slash, dot and the like would never
 be
  part of a token, i.e. they don't make a difference.
 
  Then you can search the url parts with a phrase query (

Re: Combining a String Tag with a Numeric Value

2014-08-18 Thread Erick Erickson

If you're doing this in a sharded environment, it may be interesting.

Good Luck!

Erick

On Mon, Aug 18, 2014 at 2:03 PM, Dave Seltzer dselt...@tveyes.com wrote:
 Thanks Erick,

 I'm not sure I need to score the documents based on the numeric value, but
 I am interested in being able to calculate the average (Mean) of all the
 numeric values for a given tag. For example, what is the average confidence
 of Tag1 across all documents.

 I'm not sure I can do that without building a FunctionQuery.

 -Dave


 On Mon, Aug 18, 2014 at 12:46 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Hmmm, there's no particular right way. It'd be simpler
 to index these as two separate fields _if_ there's only
 one pair per document. If there are more and you index them
 as two mutliValued fields, there's no good way at _query_ time
 to retain the association. The returned multiValued fields are
 guaranteed to be in the same order of insertion so you can
 display the correct pairs, but you can't use the association
 to score docs. Hmmm, somewhat abstract. OK say you want to
 associate two tag/value pairs, tag1:50 and tag2:100. Say further
 that you have two multiValued fields, Tags and Values and then
 index tag1 and tag2 into Tags and 50 and 100 into Values.
 There's no good way to express q=tags:tag1 and factor the
 associated value of 50 into the score

 Note that the returned _values_ will be
 Tags:   tag1 tag2
 Values  50  100

 So at that point you can see the associations.

 that said, if there's only _one_ such tag/value pair per document,
 it's easy to write a FunctionQuery (
 http://wiki.apache.org/solr/FunctionQuery)
 that does this.

 ***

 If you have many tag/value pairs, payloads are probably what you want.
 Here's an end-to-end example:

 http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/

 Best,
 Erick

 On Mon, Aug 18, 2014 at 7:32 AM, Dave Seltzer dselt...@tveyes.com wrote:
  Hello!
 
  I have some new entity data that I'm indexing which takes the form of:
 
  String: EntityString
  Float: Confidence
 
  I want to add these to a generic Tags field (for faceting), but I'm not
  sure how to hold onto the confidence. Token Payloads seem like one
 method,
  but then I'm not sure how to extract the Payload.
 
  Alternatively I could create two fields: TagIndexed which stores just the
  string value and TagStored which contains a delimited String|Float.
 
  What's the right way to do this?
 
  Thanks!
 
  -D

Re: logging in solr

2014-08-18 Thread Shawn Heisey

On 8/18/2014 2:43 PM, M, Arjun (NSN - IN/Bangalore) wrote:
 Currently in my component Solr is logging to catalina.out. What is 
 the configuration needed to redirect those logs to some custom logfile eg: 
 Solr.log.

Solr uses the slf4j library for logging.  Simply change your program to
use slf4j, and very likely the logs will go to the same place the Solr
logs do.

http://www.slf4j.org/manual.html

See also the wiki page on logging jars and Solr:

http://wiki.apache.org/solr/SolrLogging

Thanks,
Shawn

[ANNOUNCE] [SECURITY] Recommendation to update Apache POI in Apache Solr 4.8.0, 4.8.1, and 4.9.0 installations

2014-08-18 Thread Uwe Schindler

Hallo Apache Solr Users,

the Apache Lucene PMC wants to make the users of Solr aware of  the following 
issue:

Apache Solr versions 4.8.0, 4.8.1, 4.9.0 bundle Apache POI 3.10-beta2 with its 
binary release tarball. This version (and all previous ones) of Apache POI are 
vulnerable to the following issues:

= CVE-2014-3529: XML External Entity (XXE) problem in Apache POI's OpenXML 
parser =
Type: Information disclosure
Description: Apache POI uses Java's XML components to parse OpenXML files 
produced by Microsoft Office products (DOCX, XLSX, PPTX,...). Applications that 
accept such files from end-users are vulnerable to XML External Entity (XXE) 
attacks, which allows remote attackers to bypass security restrictions and read 
arbitrary files via a crafted OpenXML document that provides an XML external 
entity declaration in conjunction with an entity reference.

= CVE-2014-3574: XML Entity Expansion (XEE) problem in Apache POI's OpenXML 
parser =
Type: Denial of service
Description: Apache POI uses Java's XML components and Apache Xmlbeans to parse 
OpenXML files produced by Microsoft Office products (DOCX, XLSX, PPTX,...). 
Applications that accept such files from end-users are vulnerable to XML Entity 
Expansion (XEE) attacks (XML bombs), which allows remote hackers to consume 
large amounts of CPU resources.

The Apache POI PMC released a bugfix version (3.10.1) today.

Solr users are affected by these issues, if they enable the Apache Solr 
Content Extraction Library (Solr Cell) contrib module from the folder 
contrib/extraction of the release tarball.

Users of Apache Solr are strongly advised to keep the module disabled if they 
don't use it. Alternatively, users of Apache Solr 4.8.0, 4.8.1, or 4.9.0 can 
update the affected libraries by replacing the vulnerable JAR files in the 
distribution folder. Users of previous versions have to update their Solr 
release first, patching older versions is impossible.

To replace the vulnerable JAR files follow these steps:

- Download the Apache POI 3.10.1 binary release: 
http://poi.apache.org/download.html#POI-3.10.1
- Unzip the archive
- Delete the following files in your solr-4.X.X/contrib/extraction/lib 
folder: 
# poi-3.10-beta2.jar
# poi-ooxml-3.10-beta2.jar
# poi-ooxml-schemas-3.10-beta2.jar
# poi-scratchpad-3.10-beta2.jar
# xmlbeans-2.3.0.jar
- Copy the following files from the base folder of the Apache POI distribution 
to the solr-4.X.X/contrib/extraction/lib folder: 
# poi-3.10.1-20140818.jar
# poi-ooxml-3.10.1-20140818.jar
# poi-ooxml-schemas-3.10.1-20140818.jar
# poi-scratchpad-3.10.1-20140818.jar
- Copy xmlbeans-2.6.0.jar from POI's ooxml-lib/ folder to the 
solr-4.X.X/contrib/extraction/lib folder.
- Verify that the solr-4.X.X/contrib/extraction/lib no longer contains any 
files with version number 3.10-beta2.
- Verify that the folder contains one xmlbeans JAR file with version 2.6.0.

If you just want to disable extraction of Microsoft Office documents, delete 
the files above and don't replace them. Solr Cell will automatically detect 
this and disable Microsoft Office document extraction.

Coming versions of Apache Solr will have the updated libraries bundled.

Happy Searching and Extracting,
The Apache Lucene Developers

PS: Thanks to Stefan Kopf, Mike Boufford, and Christian Schneider for reporting 
these issues!

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/

Apache Solr Wiki

2014-08-18 Thread Mark Sun

Dear Solr Wiki admin,

We are using Solr for our multilingual asian language keywords search, as
well as visual similarity search engine (via pixolution plugin). We would
like to update the Powered by Solr section. As well as help to add on to
the knowledge base for other Solr setups.

Can you add me, username MarkSun as a contributor to the wiki?

Thank you!

Cheers,
Mark Sun
CTO

MotionElements Pte Ltd
190 Middle Road, #10-05 Fortune Centre
Singapore 188979
mark...@motionelements.com

www.motionelements.com
=
Asia-inspired Stock Animation | Video Footage l AE Template online
marketplace
=
This message may contain confidential and/or privileged information.  If
you are not the addressee or authorized to receive this for the addressee,
you must not use, copy, disclose or take any action based on this message
or any information herein. If you have received this message in error,
please advise the sender immediately by reply e-mail and delete this
message.  Thank you for your cooperation.

Re: Apache Solr Wiki

2014-08-18 Thread Erick Erickson

Done, you should have edit rights now!

Best,
Erick

On Mon, Aug 18, 2014 at 6:01 PM, Mark Sun mark...@motionelements.com wrote:
 Dear Solr Wiki admin,

 We are using Solr for our multilingual asian language keywords search, as
 well as visual similarity search engine (via pixolution plugin). We would
 like to update the Powered by Solr section. As well as help to add on to
 the knowledge base for other Solr setups.

 Can you add me, username MarkSun as a contributor to the wiki?

 Thank you!

 Cheers,
 Mark Sun
 CTO

 MotionElements Pte Ltd
 190 Middle Road, #10-05 Fortune Centre
 Singapore 188979
 mark...@motionelements.com

 www.motionelements.com
 =
 Asia-inspired Stock Animation | Video Footage l AE Template online
 marketplace
 =
 This message may contain confidential and/or privileged information.  If
 you are not the addressee or authorized to receive this for the addressee,
 you must not use, copy, disclose or take any action based on this message
 or any information herein. If you have received this message in error,
 please advise the sender immediately by reply e-mail and delete this
 message.  Thank you for your cooperation.

Apache solr sink issue

2014-08-18 Thread Jeniba Johnson

Hi,

I want to index a log file in Solr using Flume + Apache Solr sink
Iam referring this below mentioned URL
https://cwiki.apache.org/confluence/display/FLUME/How+to+Setup+Solr+Sink+for+Flume


Error  from flume console
2014-08-19 15:38:56,451 (concurrentUpdateScheduler-2-thread-1) [ERROR - 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.handleError(ConcurrentUpdateSolrServer.java:354)]
 error
java.lang.Exception: Bad Request
request: http://xxx.xx.xx:8983/solr/update?wt=javabinversion=2
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:208)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)


Error  from solr console
473844 [qtp176433427-19] ERROR org.apache.solr.core.SolrCore  â 
org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey 
field: id


Csn anyone help me with this issue and help me with the steps for integrating 
flume with solr sink



Regards,
Jeniba Johnson



The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. LT Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail

Grouping based on multiple filters/criterias

Please help to filter of group.limit

optimize and .nfsXXXX files

Solr Clustering component different results than Carrot workbench

Re: optimize and .nfsXXXX files

Editing http://wiki.apache.org/solr/PublicServers

Re: Retrieving and updating large set of documents on Solr 4.7.2

Re: How to search for phrase IAE_UPC_0001

Combining a String Tag with a Numeric Value

Re: Editing http://wiki.apache.org/solr/PublicServers

Re: How to search for phrase IAE_UPC_0001

Re: Combining a String Tag with a Numeric Value

need help in field collapsing

solr cloud going down repeatedly

Re: How to restore an index from a backup over HTTP

Re: How to restore an index from a backup over HTTP

Re: How to restore an index from a backup over HTTP

Re: solr cloud going down repeatedly

Need details on this query

how to generate stats based on time segments?

faceted query with stats not working in solrj

Re: faceted query with stats not working in solrj

Currency field type not supported for stats

Re: How to search for phrase IAE_UPC_0001

logging in solr

Re: Combining a String Tag with a Numeric Value

Re: logging in solr

Re: logging in solr

Re: Need details on this query

Re: How to search for phrase IAE_UPC_0001

Re: Combining a String Tag with a Numeric Value

Re: logging in solr

[ANNOUNCE] [SECURITY] Recommendation to update Apache POI in Apache Solr 4.8.0, 4.8.1, and 4.9.0 installations

Apache Solr Wiki

Re: Apache Solr Wiki

Apache solr sink issue

36 matches

Site Navigation

Mail list logo

Footer information