Re: Solr: How to index range-pair fields?

2015-08-22 Thread Alexandre Rafalovitch
Sorry Venkat, this is pushing beyond my immediate knowledge. You'd
just need to experiment.

But the document still looks a bit wrong, specifically I don't
understand where those extra 366 values are coming from. It should
be just a two-dimensional coordinates, first one for start of the
range, second for the end. You seem to have 2 extra useless ones.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 21 August 2015 at 21:29, vaedama sudheer.u...@gmail.com wrote:
 Alexandre,

 Fantastic answer! I think having a start position would work nicely with my
 use-case :) Also I would prefer to do the date Math during indexing.

 *Question # 1:* Can you please tell me if this doc looks correct (given that
 I am not yet bothered about factoring in year into my use-case) ?

 Student X was `absent` between dates:

  Jan 1, 2015 and Jan 15, 2015
  Feb 13, 2015 and Feb 16, 2015 (assuming that Feb 13 is 43rd day in the
 year 2015 and Feb 16 is 46th day)
  March 19, 2015 and March 25, 2015

 Also X was `present` between dates:

  Jan 25, 2015 and Jan 30, 2015
  Feb 1, 2015 and Feb 12, 2015

 {
   id: X,
   state: [absent, present],
   presentDays: [ [01 15 366 366], [43, 46, 366, 366], [78, 84, 366, 366] ],
   absentDays: [ [25, 30, 366, 366],  [32, 43, 366, 366] ]
 }

 *Question #2:*

 Since I need timestamp level granularity, what is the appropriate way to
 store the field ?

 Student X was `absent` between epoch times:

  1420104600 (9:30 AM, Jan 1 2015) and 1421341200 (5:00 PM, Jan 15, 2015)

 Is it possible to change *worldBounds* to take a polygon structure where I
 can represent millisecond level granularity ?

 Thanks in advance,
 Venkat Sudheer Reddy Aedama




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369p4224582.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Too many updates received since start

2015-08-22 Thread Yago Riveiro
Hi,

Can anyone explain me the possible causes of this warning?

too many updates received since start - startingUpdates no longer overlaps
with our currentUpdates

This warning triggers an full recovery for the shard that throw the warning.



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-updates-received-since-start-tp4224617.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to use DocumentAnalysisRequestHandler in java

2015-08-22 Thread Xavier Tannier

Hi,
Faceting is indeed the best way to do it.
Here is how it will look like in java:


SolrQuery query = new SolrQuery();
query.setQuery(id: + docId);
query.setFacet(true);
query.addFacetField(text);   // You can add all fields you 
want to inspect
query.setFacetMinCount(1); // Otherwise you'll get even 
tokens that are not in your document


QueryResponse rsp = this.index.query(query);

// Now look at the results (for field text)
FacetField facetField = rsp.getFacetField(text);
for (Count field : facetField.getValues()) {
System.out.println(field.getName());
}


Xavier.



Le 20/08/2015 22:20, Upayavira a écrit :


On Thu, Aug 20, 2015, at 04:34 PM, Jean-Pierre Lauris wrote:

Hi,
I'm trying to obtain indexed tokens from a document id, in order to see
what has been indexed exactly.
It seems that DocumentAnalysisRequestHandler does that, but I couldn't
figure out how to use it in java.

The doc says I must provide a contentstream but the available init()
method
only takes a NamedList as a parameter.
https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/handler/DocumentAnalysisRequestHandler.html

Could somebody provide me with a short example of how to get index
information from a document id?

If you are talking about what I think you are, then that is used by the
Admin UI to implement the analysis tab. You pass in a document, and it
returns it analysed.

As Alexandre says, faceting may well get you there if you want to query
a document already in your index.

Upayavira




--
Xavier Tannier
Associate Professor / Maître de conférence (HDR)
Univ. Paris-Sud
LIMSI-CNRS (bât. 508, bureau 12, RdC)
B.P. 133
91403 ORSAY CEDEX
FRANCE

http://www.limsi.fr/~xtannier/ http://www.limsi.fr/%7Extannier/
tel: 0033 (0)1 69 85 80 12
fax: 0033 (0)1 69 85 80 88
---


Re: Collapse Expand

2015-08-22 Thread Joel Bernstein
Can you explain your use case a little more?



Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Aug 21, 2015 at 5:43 PM, Kiran Sai Veerubhotla sai.sq...@gmail.com
wrote:

 how can i use collapse  expand on the docValues with json facet api?



Re: Too many updates received since start

2015-08-22 Thread Shawn Heisey
On 8/22/2015 11:51 AM, Yago Riveiro wrote:
 My heap is about 24G an I tuned it using this link
 https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr
 
 Shawn updated since I use it and some configuration are not in this document
 any more.
 
 I see on my GC logs pauses about 6s, my index has a high index rate  1000
 docs/s.
 
 I'm running java 7u25, maybe upgrading to java 8 the GC pauses reduced.
 
 I don't know if is safe use java 8 in production with solr ...

If I remember right, you are running SolrCloud ... which means you're on
at least 4.x.

I have not heard about any problems running Solr in Java 8, but I only
have concrete information for 4.x and later.  I've heard indirectly that
3.x does work, but I haven't confirmed that rumor.  I am running 4.9.1
on Java 8 for one of my indexes and it is working very well.

Whether you use Java 7 or 8, you should definitely use the latest
release.  OpenJDK 7 and later is good, but the Oracle version is
recommended.

Thanks,
Shawn



Re: Can TrieDateField fields be null?

2015-08-22 Thread Erick Erickson
TrieDateFields can be null. Actually, just not in the document.
I just verified with 4.10

How are you indexing? I suspect that somehow the program that's sending
things to Solr is putting the default time in.

What version of Solr?

Best,
Erick

On Sat, Aug 22, 2015 at 4:04 PM, Henrique O. Santos hensan...@gmail.com wrote:
 Hello,

 Just a simple question. Can TrieDateField fields be null? I have a schema
 with the following field and type:
 field name=started_at type=date indexed=true stored=true
 docValues=true /
 fieldType name=date class=solr.TrieDateField precisionStep=0
 positionIncrementGap=0/

 Every time I index a document with no value for this field, the current time
 gets indexed and stored. Is there anyway to make this field null?

 My use case for this collection requires that I check if that date field is
 already filled or not.

 Thank you,
 Henrique.


Re: Too many updates received since start

2015-08-22 Thread Shawn Heisey
On 8/22/2015 3:50 PM, Yago Riveiro wrote:
 I'm using java 7u25 oracle version with Solr 4.6.1
 
 It work well with  98% of throughput but in some full GC the issue arises. A 
 full sync for one shard is more than 50G.
 
 There is any configuration to configurate the number of docs behind leader 
 that a replica can be?

It looks like the number of docs is configurable in 5.1 and later:

https://issues.apache.org/jira/browse/SOLR-6359

There is apparently a caveat related to SolrCloud recovery, which I am
having trouble grasping:

the 20% newest existing transaction log of the core to be recovered
must be newer than the 20% oldest existing transaction log of the good
core.

Thanks,
Shawn



Re: Solr performance is slow with just 1GB of data indexed

2015-08-22 Thread Zheng Lin Edwin Yeo
Hi Shawn,

Yes, I've increased the heap size to 4GB already, and I'm using a machine
with 32GB RAM.

Is it recommended to further increase the heap size to like 8GB or 16GB?

Regards,
Edwin
On 23 Aug 2015 10:23, Shawn Heisey apa...@elyograg.org wrote:

 On 8/22/2015 7:31 PM, Zheng Lin Edwin Yeo wrote:
  I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr.
 
  However, I find that clustering is exceeding slow after I index this 1GB
 of
  data. It took almost 30 seconds to return the cluster results when I set
 it
  to cluster the top 1000 records, and still take more than 3 seconds when
 I
  set it to cluster the top 100 records.
 
  Is this speed normal? Cos i understand Solr can index terabytes of data
  without having the performance impacted so much, but now the collection
 is
  slowing down even with just 1GB of data.

 Have you increased the heap size?  If you simply start Solr 5.x with the
 included script and don't use any commandline options, Solr will only
 have a 512MB heap.  This is *extremely* small.  A significant chunk of
 that 512MB heap will be required just to start Jetty and Solr, so
 there's not much memory left for manipulating the index data and serving
 queries.  Assuming you have at least 4GB of RAM, try adding -m 2g to
 the start commandline.

 Thanks,
 Shawn




Re: Too many updates received since start

2015-08-22 Thread Yago Riveiro
I'm using java 7u25 oracle version with Solr 4.6.1




It work well with  98% of throughput but in some full GC the issue arises. A 
full sync for one shard is more than 50G.




There is any configuration to configurate the number of docs behind leader that 
a replica can be?

On Sat, Aug 22, 2015 at 8:53 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 8/22/2015 11:51 AM, Yago Riveiro wrote:
 My heap is about 24G an I tuned it using this link
 https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr
 
 Shawn updated since I use it and some configuration are not in this document
 any more.
 
 I see on my GC logs pauses about 6s, my index has a high index rate  1000
 docs/s.
 
 I'm running java 7u25, maybe upgrading to java 8 the GC pauses reduced.
 
 I don't know if is safe use java 8 in production with solr ...
 If I remember right, you are running SolrCloud ... which means you're on
 at least 4.x.
 I have not heard about any problems running Solr in Java 8, but I only
 have concrete information for 4.x and later.  I've heard indirectly that
 3.x does work, but I haven't confirmed that rumor.  I am running 4.9.1
 on Java 8 for one of my indexes and it is working very well.
 Whether you use Java 7 or 8, you should definitely use the latest
 release.  OpenJDK 7 and later is good, but the Oracle version is
 recommended.
 Thanks,
 Shawn

Can TrieDateField fields be null?

2015-08-22 Thread Henrique O. Santos

Hello,

Just a simple question. Can TrieDateField fields be null? I have a 
schema with the following field and type:
field name=started_at type=date indexed=true stored=true 
docValues=true /
fieldType name=date class=solr.TrieDateField precisionStep=0 
positionIncrementGap=0/


Every time I index a document with no value for this field, the current 
time gets indexed and stored. Is there anyway to make this field null?


My use case for this collection requires that I check if that date field 
is already filled or not.


Thank you,
Henrique.


Solr performance is slow with just 1GB of data indexed

2015-08-22 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr.

However, I find that clustering is exceeding slow after I index this 1GB of
data. It took almost 30 seconds to return the cluster results when I set it
to cluster the top 1000 records, and still take more than 3 seconds when I
set it to cluster the top 100 records.

Is this speed normal? Cos i understand Solr can index terabytes of data
without having the performance impacted so much, but now the collection is
slowing down even with just 1GB of data.

Below is my clustering configurations in solrconfig.xml.

 requestHandler name=/clustering
  startup=lazy
  enable=${solr.clustering.enabled:true}
  class=solr.SearchHandler
lst name=defaults
   str name=echoParamsexplicit/str
  int name=rows1000/int
   str name=wtjson/str
   str name=indenttrue/str
  str name=dftext/str
  str name=flnull/str

  bool name=clusteringtrue/bool
  bool name=clustering.resultstrue/bool
  str name=carrot.titlesubject content tag/str
  bool name=carrot.produceSummarytrue/bool

 int name=carrot.fragSize20/int
  !-- the maximum number of labels per cluster --
  int name=carrot.numDescriptions20/int
  !-- produce sub clusters --
  bool name=carrot.outputSubClustersfalse/bool
 str name=LingoClusteringAlgorithm.desiredClusterCountBase7/str

  !-- Configure the remaining request handler parameters. --
  str name=defTypeedismax/str
/lst
arr name=last-components
  strclustering/str
/arr
  /requestHandler


Regards,
Edwin


Re: Solr performance is slow with just 1GB of data indexed

2015-08-22 Thread Shawn Heisey
On 8/22/2015 7:31 PM, Zheng Lin Edwin Yeo wrote:
 I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr.
 
 However, I find that clustering is exceeding slow after I index this 1GB of
 data. It took almost 30 seconds to return the cluster results when I set it
 to cluster the top 1000 records, and still take more than 3 seconds when I
 set it to cluster the top 100 records.
 
 Is this speed normal? Cos i understand Solr can index terabytes of data
 without having the performance impacted so much, but now the collection is
 slowing down even with just 1GB of data.

Have you increased the heap size?  If you simply start Solr 5.x with the
included script and don't use any commandline options, Solr will only
have a 512MB heap.  This is *extremely* small.  A significant chunk of
that 512MB heap will be required just to start Jetty and Solr, so
there's not much memory left for manipulating the index data and serving
queries.  Assuming you have at least 4GB of RAM, try adding -m 2g to
the start commandline.

Thanks,
Shawn



Re: Number of requests to each shard is different with and without using of grouping

2015-08-22 Thread Ramkumar R. Aiyengar
M is the number of ids you want for each group, specified by group.limit.
It's unrelated to the number of rows requested..
On 21 Aug 2015 19:54, SolrUser1543 osta...@gmail.com wrote:

 Ramkumar R. Aiyengar wrote
  Grouping does need 3 phases.. The phases are:
 
 
  (2) For the N groups, each shard is asked for the top M ids (M is
  configurable per request).
 

 What do you exactly means by /M is configurable per request/ ? how
 exactly
 is it configurable and what is the relation between N ( which is initial
 rows number )  and M  ?




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Number-of-requests-to-each-shard-is-different-with-and-without-using-of-grouping-tp4224293p4224521.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Too many updates received since start

2015-08-22 Thread Susheel Kumar
You can try to follow the suggestions at below link which had similar
issued and see if that helps.


http://lucene.472066.n3.nabble.com/ColrCloud-IOException-occured-when-talking-to-server-at-td4061831.html


Thnx

On Sat, Aug 22, 2015 at 9:05 AM, Yago Riveiro yago.rive...@gmail.com
wrote:

 Hi,

 Can anyone explain me the possible causes of this warning?

 too many updates received since start - startingUpdates no longer overlaps
 with our currentUpdates

 This warning triggers an full recovery for the shard that throw the
 warning.



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Too-many-updates-received-since-start-tp4224617.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Too many updates received since start

2015-08-22 Thread Yago Riveiro
My heap is about 24G an I tuned it using this link
https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr

Shawn updated since I use it and some configuration are not in this document
any more.

I see on my GC logs pauses about 6s, my index has a high index rate  1000
docs/s.

I'm running java 7u25, maybe upgrading to java 8 the GC pauses reduced.

I don't know if is safe use java 8 in production with solr ...



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-updates-received-since-start-tp4224617p4224631.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collapse Expand

2015-08-22 Thread Nagasharath
Using json facet api for nested faceting on the docValues. 

Trying to improve the query time and I read in a blog that query time on 
docValue can be improved with collapse  expand.





 On 22-Aug-2015, at 9:29 am, Joel Bernstein joels...@gmail.com wrote:
 
 Can you explain your use case a little more?
 
 
 
 Joel Bernstein
 http://joelsolr.blogspot.com/
 
 On Fri, Aug 21, 2015 at 5:43 PM, Kiran Sai Veerubhotla sai.sq...@gmail.com
 wrote:
 
 how can i use collapse  expand on the docValues with json facet api?