Re: Solr - zoo with more than 1000 collections

2018-06-29 Thread Yago Riveiro
Solr doesn’t scale very well with ~2K collections, and yes de bottleneck is 
Zookeeper itself.

Zookeeper doesn’t perform operation as quickly as expected with folders with a 
lot of children.

In a scenario where you are in a recovery state (a node crash), this limitation 
will hurt a lot, the queue work stacks recovery operations due the low 
throughput to consume the queue.

Regards.

--

Yago Riveiro

On 29 Jun 2018 17:38 +0100, Bertrand Mahé , wrote:
> Hi,
>
>
>
> In order to store timeseries data and perform deletion easily, we create a
> several collections per day and then use aliases.
>
>
>
> We are using SOLR 7.3 and we have 2 questions:
>
>
>
> Q1 : In order to access quickly the latest data would it be possible to load
> cores in descending chronological order rather than alphabetical order?
>
>
>
> Q2: When we exceed 1200-1300 collections, zookeeper suddenly changes from
> 6-700 KB RAM to 3 GB RAM which makes zoo very slow or almost unusable. Is
> this normal?
>
>
>
> Thanks in advance,
>
>
>
> Bertrand
>


Re: Largest number of indexed documents used by Solr

2018-04-03 Thread Yago Riveiro
Hi,

In my company we are running a 12 node cluster with 10 (american) Billion 
documents 12 shards / 2 replicas.

We do mainly faceting queries with a very reasonable performance.

36 million documents it's not an issue, you can handle that volume of documents 
with 2 nodes with SSDs and 32G of ram

Regards.

--

Yago Riveiro

On 4 Apr 2018 02:15 +0100, Abhi Basu <9000r...@gmail.com>, wrote:
> We have tested Solr 4.10 with 200 million docs with avg doc size of 250 KB.
> No issues with performance when using 3 shards / 2 replicas.
>
>
>
> On Tue, Apr 3, 2018 at 8:12 PM, Steven White <swhite4...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I'm about to start a project that requires indexing 36 million records
> > using Solr 7.2.1. Each record range from 500 KB to 0.25 MB where the
> > average is 0.1 MB.
> >
> > Has anyone indexed this number of records? What are the things I should
> > worry about? And out of curiosity, what is the largest number of records
> > that Solr has indexed which is published out there?
> >
> > Thanks
> >
> > Steven
> >
>
>
>
> --
> Abhi Basu


Re: Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread Yago Riveiro
Hi murugesh,

This error happen normally when you are in long GC pauses. Try to rise the heap 
memory.

The only way to recover from this is restarting the affected node.

Regard.

--

Yago Riveiro

On 2 Apr 2018 15:39 +0100, murugesh karmegam <kmak...@gmail.com>, wrote:
> We noticed this issue in our solr clusters right after when Solr cluster is
> restarted or Solr cluster is live for some time. Based on my research so
> far... I am not seeing zookeeper connection issues from zk server side. It
> seems it is solr side ( zk client) side. This issue is pretty constant now
> and then.
>
> Error 1 Solr:
>
> WARN - 2018-02-06 17:35:04.742;
> org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
> ERROR - 2018-02-06 17:35:04.743; org.apache.solr.common.SolrException;
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are
> disabled.
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1508)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:696)
> at
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
>
>
> Error 2:
>
> From ingestor log:
> /var/log/mwired/core-ingestors/app.log.9:2018-03-30 05:44:52,616 [-38] ERROR
> org.apache.solr.client.solrj.impl.CloudSolrClient - Request to collection
> failed due to (503)
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at : Cannot talk to ZooKeeper - Updates are disabled., retry? 0
> /var/log/mwired/core-ingestors/app.log.9:com.mwired.grid.commons.exception.PersistenceException:
> Failed to add 11 docs to solr0 collection , cachedDocs=118; because Error
> from server at : Cannot talk to ZooKeeper - Updates are disabled.
>
>
> Wondering is there any fix? Appreciate any input.
>
> http://lucene.472066.n3.nabble.com/Cannot-talk-to-ZooKeeper-Updates-are-disabled-Solr-6-3-0-td4311582.html
> http://lucene.472066.n3.nabble.com/6-6-Cannot-talk-to-ZooKeeper-Updates-are-disabled-td4352917.html
> https://issues.apache.org/jira/browse/SOLR-3274
>
> Thanks in advance.
> Murux
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Protect a collection to be deleted

2017-12-13 Thread Yago Riveiro
That can work, but the goal it’s to avoid human error (like the UI that enforce 
you to type de name of the collection on delete) independently of the access 
level.

Regards

--

/Yago Riveiro

On 12 Dec 2017 20:24 +, Anshum Gupta <ansh...@apple.com>, wrote:
> You might want to explore Rule based authorization in Solr and stop non-admin 
> users from deleting collections etc. Here’s the link to the documentation: 
> https://lucene.apache.org/solr/guide/6_6/rule-based-authorization-plugin.html
>
> -Anshum
>
>
>
> > On Dec 12, 2017, at 9:27 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:
> >
> > Hi,
> >
> > Is it possible in Solr protect a collection to be deleted through a
> > property?
> >
> > Regards
> >
> >
> >
> >
> > -
> > Best regards
> >
> > /Yago
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Protect a collection to be deleted

2017-12-12 Thread Yago Riveiro
Thanks Shawn for address the question to Jira.

Indeed I want to continue to insert data in the collection.

I found that delete a collection by mistake using the API it’s to easy and 
prone to human error.

Regards,

--

Yago Riveiro

On 12 Dec 2017 19:05 +, Shawn Heisey <apa...@elyograg.org>, wrote:
> On 12/12/2017 10:27 AM, Yago Riveiro wrote:
> > Is it possible in Solr protect a collection to be deleted through a
> > property?
>
> I doubt that this is possible at the moment.
>
> The suggestion from Markus to change permissions on the index files
> would prevent the actual index from being deleted, but I suspect that a
> Collections API delete would still remove the collection from the
> cloud.  Also, it would prevent any changes to the index, which reduces
> Solr's usefulness.
>
> An update processor to block mistakes is an interesting idea, but again,
> doesn't keep you from deleting the collection entirely with the
> collections API.
>
> I've opened an issue for an idea to achieve what I *think* you're after:
>
> https://issues.apache.org/jira/browse/SOLR-11751
>
> Thanks,
> Shawn
>


RE: Protect a collection to be deleted

2017-12-12 Thread Yago Riveiro
I don’t know if it’s possible but, if we can mark the collection like 
protected, we can avoid DELETE command to remove de collection.

Maybe set the flag when CREATE command is executed?

This is an interesting feature to avoid human errors, and relatively easy to 
implement.

Regards

--

Yago Riveiro

On 12 Dec 2017 17:45 +, Markus Jelsma <markus.jel...@openindex.io>, wrote:
> Hello,
>
> Well, you could remove the write permission for all segment files. Or, make a 
> custom UpdateProcessor that intercepts *:* operations and stops a delete in 
> its tracks. This is what we did, protect the search against me. Keep in mind 
> that a negative query can also delete everything, so you can check if the 
> numRows of the proposed delete query is equals to the number of documents.
>
> Regards,
> Markus
>
> -Original message-
> > From:Yago Riveiro <yago.rive...@gmail.com
> > Sent: Tuesday 12th December 2017 18:28
> > To: solr-user@lucene.apache.org
> > Subject: Protect a collection to be deleted
> >
> > Hi,
> >
> > Is it possible in Solr protect a collection to be deleted through a
> > property?
> >
> > Regards
> >
> >
> >
> >
> > -
> > Best regards
> >
> > /Yago
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >


Protect a collection to be deleted

2017-12-12 Thread Yago Riveiro
Hi, 

Is it possible in Solr protect a collection to be deleted through a
property?

Regards
 



-
Best regards

/Yago
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 6.5.1 process crash after jshort_disjoint_arraycopy error

2017-11-15 Thread Yago Riveiro
Nop,

I never found a fix for this problem, sorry.

Regards.

--

Yago Riveiro

On 15 Nov 2017 09:44 +, tothis <toth.ist...@danubiusinfo.hu>, wrote:
> Hi Yago,
>
> we are facing the same problem. Could you solve it somehow?
>
> thx
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How many collections in a solrcloud are too many, how to determine this?

2017-08-09 Thread Yago Riveiro
I have a cluster (12 nodes) with 664 collection, 12 shards each and replication 
factor 2

The main bottleneck will be the zookeeper, it’s too easy overflow the overseer 
queue when a node ejects due a GC pause. Other problem is that the time to 
restart a node will increase from seconds to minutes.

The tradeoff is not easy, depends of the number of machines, the volume of 
data, hardware and so on.

--

/Yago Riveiro

On 8 Aug 2017 20:27 +0100, Webster Homer <webster.ho...@sial.com>, wrote:
> Yes we do see replicas go into recovery.
>
> Most of our clouds are hosted in the google cloud. So flaky networks are
> probably not an issue, though firewalls to the clouds can be
>
> On Tue, Aug 8, 2017 at 2:14 PM, Erick Erickson <erickerick...@gmail.com
> wrote:
>
> > So in total you have 56 replicas, correct? This shouldn't be a
> > problem, we've seen many more replicas than that. Many many many.
> >
> > Do you ever see any replicas go into recovery? One common problem is
> > that GC exceeds the timeouts for, say, Zookeeper to contact nodes and
> > they'll cycle through recovery. Have you captured the GC logs and seen
> > if you have large stop-the-world GC pauses?
> >
> > In short, what you've described should be easily handled. My guess is
> > GC pauses, I/O contention and/or flaky networks
> >
> > Best,
> > Erick
> >
> > On Tue, Aug 8, 2017 at 11:35 AM, Webster Homer <webster.ho...@sial.com
> > wrote:
> > > We have a Solrcloud environments that have 4 solr nodes and a 3 node
> > > Zookeeper ensemble. All of the collections are configured to have 2
> > shards
> > > with 2 replicas. In this environment we have 14 different collections.
> > Some
> > > of these collections are hardly touched others have a fairly heavy search
> > > and update load.
> > > 1 collection his near real time updates every minutes and constant
> > > searches, but it is not very large
> > > another has a fairly constant search load with updates of a few records
> > > every 15 minutes. 6 collections are search heavy but update light (1 full
> > > load per week with daily partials)
> > >
> > > Updates to production cloud are via CDCR from an "authoring" cloud which
> > > replicates to two production clouds.
> > > We often see issues with replicas not being updated, and tlogs
> > accumulating.
> > >
> > > We have autoCommit and autoSoftCommit set on all our collections, and
> > CDCR
> > > logs disabled. We are running Solr 6.2
> > >
> > > We also run into errors saying that "no live solr Servers available to
> > > service the request" but all nodes appear healthy. So I've been
> > wondering
> > > if we just have too many collections for the number of nodes.
> > >
> > > Are there tell tale diagnostics that could determine if the servers are
> > > over loaded?
> > >
> > > Are there any guidelines for number of collections vs number of nodes in
> > a
> > > solrcloud?
> > >
> > > We run our zookeepers via supervisord, and all of this is behind
> > firewalls.
> > > So the Zookeeper JMX interface is useless. How do we get good diagnostics
> > > from Zookeeper? I know that sometimes problems go away when we restart
> > the
> > > Zookeepers and the solr nodes.
> > >
> > > Thanks
> > >
> > > --
> > >
> > >
> > > This message and any attachment are confidential and may be privileged or
> > > otherwise protected from disclosure. If you are not the intended
> > recipient,
> > > you must not copy this message or attachment or disclose the contents to
> > > any other person. If you have received this transmission in error, please
> > > notify the sender immediately and delete the message and any attachment
> > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not accept liability for any omissions or errors in this
> > > message which may arise as a result of E-Mail-transmission or for damages
> > > resulting from any unauthorized changes of the content of this message
> > and
> > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not guarantee that this message is free of viruses and
> > does
> > > not accept liability for any damages caused by any virus transmitted
> > > therewith.
> > >
> > > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > > 

Re: IndexReaders cannot exceed 2 Billion

2017-08-07 Thread Yago Riveiro
You have the maximum number of docs in a single shard.

If I'm not wrong, the only solution is split the index in more shards (if you 
are running solrcloud mode).

--

/Yago Riveiro

On 7 Aug 2017, 16:48 +0100, Wael Kader <w...@softech-lb.com>, wrote:
> Hello,
>
> I faced an issue that is making me go crazy.
> I am running SOLR saving data on HDFS and I have a single node setup with
> an index that has been running fine until today.
> I know that 2 billion documents is too much on a single node but it has
> been running fine for my requirements and it was pretty fast.
>
> I restarted SOLR today and I am getting an error stating "Too many
> documents, composite IndexReaders cannot exceed 2147483519.
> The last backup I have is 2 weeks back and I really need the index to start
> to get the data from the index.
>
> Please help !
> --
> Regards,
> Wael


Re: Truncated chunk in CloudSolrStream

2017-05-25 Thread Yago Riveiro
Nop, this happened since 6.3.0 (when I started use the CloudSolrStream), now 
I’m using 6.5.1 code.

Normally this happen with streams with more than 4M documents.

Can be related with network? Is there any TTL in the CloudSolrStream at 
connection level?

--

/Yago Riveiro

On 25 May 2017 13:14 +0100, Joel Bernstein <joels...@gmail.com>, wrote:
> I've never seen this error. Is this something you just started seeing
> recently?
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, May 25, 2017 at 7:10 AM, Yago Riveiro <yago.rive...@gmail.com
> wrote:
>
> > I have a process that uses the CloudSolrStream to run a streaming
> > expression
> > and I can see this exception frequently:
> >
> > Caused by: org.apache.http.TruncatedChunkException: Truncated chunk (
> > expected size: 32768; actual size: 1100)
> > at
> > org.apache.http.impl.io.ChunkedInputStream.read(
> > ChunkedInputStream.java:200)
> > ~[supernova-2.4.0.jar:?]
> > at
> > org.apache.http.conn.EofSensorInputStream.read(
> > EofSensorInputStream.java:137)
> > ~[supernova-2.4.0.jar:?]
> > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> > ~[?:1.8.0_121]
> > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> > ~[?:1.8.0_121]
> > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> > ~[?:1.8.0_121]
> > at java.io.InputStreamReader.read(InputStreamReader.java:184)
> > ~[?:1.8.0_121]
> > at org.noggit.JSONParser.fill(JSONParser.java:196)
> > ~[supernova-2.4.0.jar:?]
> > at org.noggit.JSONParser.getMore(JSONParser.java:203)
> > ~[supernova-2.4.0.jar:?]
> > at org.noggit.JSONParser.readStringChars2(JSONParser.java:646)
> > ~[supernova-2.4.0.jar:?]
> > at org.noggit.JSONParser.readStringChars(JSONParser.java:626)
> > ~[supernova-2.4.0.jar:?]
> > at org.noggit.JSONParser.getStringChars(JSONParser.java:1029)
> > ~[supernova-2.4.0.jar:?]
> > at org.noggit.JSONParser.getString(JSONParser.java:1017)
> > ~[supernova-2.4.0.jar:?]
> > at org.noggit.ObjectBuilder.getString(ObjectBuilder.java:68)
> > ~[supernova-2.4.0.jar:?]
> > at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:51)
> > ~[supernova-2.4.0.jar:?]
> > at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:128)
> > ~[supernova-2.4.0.jar:?]
> > at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:57)
> > ~[supernova-2.4.0.jar:?]
> > at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:37)
> > ~[supernova-2.4.0.jar:?]
> > at
> > org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > next(JSONTupleStream.java:85)
> > ~[supernova-2.4.0.jar:?]
> > at
> > org.apache.solr.client.solrj.io.stream.SolrStream.read(
> > SolrStream.java:207)
> > ~[supernova-2.4.0.jar:?]
> >
> > The code snip running the stream seems like this:
> >
> > CloudSolrStream cstream = new CloudSolrStream()
> > cstream.open();
> >
> > while (true) {
> > Tuple tuple = cstream.read();
> > ..
> > }
> >
> > Regards,
> >
> >
> >
> >
> > -
> > Best regards
> >
> > /Yago
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Truncated-chunk-in-CloudSolrStream-tp4337181.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >


Truncated chunk in CloudSolrStream

2017-05-25 Thread Yago Riveiro
I have a process that uses the CloudSolrStream to run a streaming expression
and I can see this exception frequently:

Caused by: org.apache.http.TruncatedChunkException: Truncated chunk (
expected size: 32768; actual size: 1100)
at
org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:200)
~[supernova-2.4.0.jar:?]
at
org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
~[supernova-2.4.0.jar:?]
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
~[?:1.8.0_121]
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
~[?:1.8.0_121]
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
~[?:1.8.0_121]
at java.io.InputStreamReader.read(InputStreamReader.java:184)
~[?:1.8.0_121]
at org.noggit.JSONParser.fill(JSONParser.java:196)
~[supernova-2.4.0.jar:?]
at org.noggit.JSONParser.getMore(JSONParser.java:203)
~[supernova-2.4.0.jar:?]
at org.noggit.JSONParser.readStringChars2(JSONParser.java:646)
~[supernova-2.4.0.jar:?]
at org.noggit.JSONParser.readStringChars(JSONParser.java:626)
~[supernova-2.4.0.jar:?]
at org.noggit.JSONParser.getStringChars(JSONParser.java:1029)
~[supernova-2.4.0.jar:?]
at org.noggit.JSONParser.getString(JSONParser.java:1017)
~[supernova-2.4.0.jar:?]
at org.noggit.ObjectBuilder.getString(ObjectBuilder.java:68)
~[supernova-2.4.0.jar:?]
at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:51)
~[supernova-2.4.0.jar:?]
at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:128)
~[supernova-2.4.0.jar:?]
at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:57)
~[supernova-2.4.0.jar:?]
at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:37)
~[supernova-2.4.0.jar:?]
at
org.apache.solr.client.solrj.io.stream.JSONTupleStream.next(JSONTupleStream.java:85)
~[supernova-2.4.0.jar:?]
at
org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:207)
~[supernova-2.4.0.jar:?]

The code snip running the stream seems like this:

CloudSolrStream cstream = new CloudSolrStream()
cstream.open();

while (true) {
Tuple tuple = cstream.read();
..
}

Regards,




-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Truncated-chunk-in-CloudSolrStream-tp4337181.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: LukeRequestHandler not returning all fields in the index

2017-05-22 Thread Yago Riveiro
Ok ... then I have no way to know the full list of fields in my collection
without doing a LukeRequest to all of the shards and do a merge in the end,
isn't it?

Streaming expressions doesn't allow * wildcard, the LukeRequest doesn't
return all fields .. no way to pull all data from a collection in a
programatic simple way :/

Thanks for the answer Erick. 



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/LukeRequestHandler-not-returning-all-fields-in-the-index-tp4336287p4336332.html
Sent from the Solr - User mailing list archive at Nabble.com.


LukeRequestHandler not returning all fields in the index

2017-05-22 Thread Yago Riveiro
I'm struggle with a situation that I think can be a bug

The LukeRequestHandler is not returning all fields that exists in one
collection with 12 shards on 12 nodes (1 shard on each node)

Running this request "http://localhost:8983/solr/collection/admin/luke; in
each node the list of fields are the same except one. The different is that
exists one document on one shard with dynamic fields that doesn't exists in
other shards.

It's this the normal behaviour or should return all fields in all shards?

Regards



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/LukeRequestHandler-not-returning-all-fields-in-the-index-tp4336287.html
Sent from the Solr - User mailing list archive at Nabble.com.


Couldn't decorate docValues for field message in logs

2017-05-05 Thread Yago Riveiro
Hi,

I have a field type in my schema configured as:



The goal of this field type is allow fields to be faceted and displaying
data is necessary. The field be searchable is not a requisite.

While I'm indexing data I have this annoying warning in logs:

Couldn't decorate docValues for field: [field1_s],​ schemaField:
[field1_s{type=string,​properties=omitNorms,​omitTermFreqAndPositions,​sortMissingLast,​docValues,​useDocValuesAsStored}]

What this warning means?

Other thing that I found odd is that fields whit this field type are
currently searchable with indexed="false"



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Couldn-t-decorate-docValues-for-field-message-in-logs-tp4333505.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Export endpoint broken in solr 6.5.1?

2017-05-05 Thread Yago Riveiro
Joel,

Thank for the advice, indeed the /export handler was referenced in the
config.

The streaming expression is working.



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Export-endpoint-broken-in-solr-6-5-1-tp4333416p4333504.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Export endpoint broken in solr 6.5.1?

2017-05-04 Thread Yago Riveiro
Older build with that was upgraded from 6.3.0 to 6.5.1.

The config used in 6.3.0 are the same used in 6.5.1 without changes.

Should I update my configs?

--

/Yago Riveiro

On 4 May 2017, 21:45 +0100, Joel Bernstein <joels...@gmail.com>, wrote:
> Did this error come from a standard 6.5.1 build, or form a build that was
> upgraded to 6.5.1 with older config files?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, May 4, 2017 at 1:57 PM, Yago Riveiro <yago.rive...@gmail.com> wrote:
>
> > I'm trying to run this streaming expression
> >
> > search(data,qt="/export",q="*:*",fl="id",sort="id asc")
> >
> > and I'm hitting this exception:
> >
> > 2017-05-04 17:24:05.156 ERROR (qtp1937348256-378) [c:data s:shard7
> > r:core_node38 x:data_shard7_replica1] o.a.s.c.s.i.s.ExceptionStream
> > java.io.IOException: java.util.concurrent.ExecutionException:
> > java.io.IOException: --> http://solr-node-1:8983/solr/
> > data_shard2_replica1/:
> > An exception has occurred on the server, refer to server log for details.
> > at
> > org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > openStreams(CloudSolrStream.java:451)
> > at
> > org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > open(CloudSolrStream.java:308)
> > at
> > org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > open(ExceptionStream.java:51)
> > at
> > org.apache.solr.handler.StreamHandler$TimerStream.
> > open(StreamHandler.java:490)
> > at
> > org.apache.solr.client.solrj.io.stream.TupleStream.
> > writeMap(TupleStream.java:78)
> > at
> > org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)
> > at
> > org.apache.solr.response.TextResponseWriter.writeVal(
> > TextResponseWriter.java:193)
> > at
> > org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> > JSONResponseWriter.java:209)
> > at
> > org.apache.solr.response.JSONWriter.writeNamedList(
> > JSONResponseWriter.java:325)
> > at
> > org.apache.solr.response.JSONWriter.writeResponse(
> > JSONResponseWriter.java:120)
> > at
> > org.apache.solr.response.JSONResponseWriter.write(
> > JSONResponseWriter.java:71)
> > at
> > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> > QueryResponseWriterUtil.java:65)
> > at
> > org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:809)
> > at org.apache.solr.servlet.HttpSolrCall.call(
> > HttpSolrCall.java:538)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:347)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:298)
> > at
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > doFilter(ServletHandler.java:1691)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:143)
> > at
> > org.eclipse.jetty.security.SecurityHandler.handle(
> > SecurityHandler.java:548)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> > doHandle(SessionHandler.java:226)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> > doHandle(ContextHandler.java:1180)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> > doScope(SessionHandler.java:185)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> > doScope(ContextHandler.java:1112)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:141)
> > at
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> > ContextHandlerCollection.java:213)
> > at
> > org.eclipse.jetty.server.handler.HandlerCollection.
> > handle(HandlerCollection.java:119)
> > at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > HandlerWrapper.java:134)
> > at
> > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
> > RewriteHandler.java:335)
> > at
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > HandlerWrapper.java:134)
> > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > at org.eclipse.jetty.server.HttpChannel.handle(
> > HttpChannel.java:320)
> > at
> > org.eclipse.jetty.server.HttpConnection.onFillable(
> > HttpConnection.java:251

Export endpoint broken in solr 6.5.1?

2017-05-04 Thread Yago Riveiro
I'm trying to run this streaming expression

search(data,qt="/export",q="*:*",fl="id",sort="id asc")

and I'm hitting this exception:

2017-05-04 17:24:05.156 ERROR (qtp1937348256-378) [c:data s:shard7
r:core_node38 x:data_shard7_replica1] o.a.s.c.s.i.s.ExceptionStream
java.io.IOException: java.util.concurrent.ExecutionException:
java.io.IOException: --> http://solr-node-1:8983/solr/data_shard2_replica1/:
An exception has occurred on the server, refer to server log for details.
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.openStreams(CloudSolrStream.java:451)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.open(CloudSolrStream.java:308)
at
org.apache.solr.client.solrj.io.stream.ExceptionStream.open(ExceptionStream.java:51)
at
org.apache.solr.handler.StreamHandler$TimerStream.open(StreamHandler.java:490)
at
org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(TupleStream.java:78)
at
org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:193)
at
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:209)
at
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:325)
at
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:120)
at
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:71)
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:809)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:538)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: -->
http://solr-node-1:8983/solr/data_shard2_replica1/: An exception has
occurred on the server, refer to server log for details.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.openStreams(CloudSolrStream.java:445)
... 42 more
Caused by: java.io.IOException: -->

Solr 6.5.1 process crash after jshort_disjoint_arraycopy error

2017-05-03 Thread Yago Riveiro
# A fatal error has been detected by the Java Runtime Environment:
Hi,

I'm running 6.5.1 using Java 8 build 1.8.0_131-b11 and solr's process crash
with this log

#  SIGBUS (0x7) at pc=0x7fd2c87ea014, pid=4468, tid=0x7fd1f487e700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build
1.8.0_131-b11)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# v  ~StubRoutines::jshort_disjoint_arraycopy
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /opt/solr/solr-6.5.1/server/hs_err_pid4468.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp


Any idea what is going on?



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-5-1-process-crash-after-jshort-disjoint-arraycopy-error-tp4333150.html
Sent from the Solr - User mailing list archive at Nabble.com.


Aliases feature scales?

2017-04-19 Thread Yago Riveiro
Hi, 

Does Anyone know if there is any theoretical limit related to the number of
aliases that a Solr cluster can handle?

If I create like 10K aliases would I experiment any kind of bottleneck?

Regards





-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Aliases-feature-scales-tp4330721.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 6.3.0, possible SYN flooding on port 8983. Sending cookies.

2017-03-04 Thread Yago Riveiro
I’m using guzzle 3 for HTTP (it’s old but it’s the only one that works in 5.3) 
and the documentation says that use persistent connection (but you know … is 
PHP, weird things happen).

Maybe I need to dump data to disk an use Java to post it ...

--

/Yago Riveiro

On 4 Mar 2017 16:50 +, Walter Underwood <wun...@wunderwood.org>, wrote:
> PHP uses the curl library for HTTP. It is a bit of a mess. It opens a new 
> connection for every request.
>
> I would not try to pool client connections with PHP. PHP starts over with a 
> new environment for each page, so that will be very hard to manage.
>
> I would suggest running haproxy or something similar on the same host as PHP. 
> Connect to it locally, and let it pool connections to Solr. That will use 
> Unix-local connections that don’t actually run TCP.
>
> Really, don’t try to fix networking inside PHP.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/ (my blog)
>
>
> > On Mar 4, 2017, at 2:32 AM, Mikhail Khludnev <m...@apache.org> wrote:
> >
> > I hardly can comment regarding PHP. But if you call curl as an external
> > program it.s a dead end. However, giving
> > http://stackoverflow.com/questions/972925/persistent-keepalive-http-with-the-php-curl-library
> > you can reuse a 'context' across curl library calls and make sure that's
> > keep-alive pool is large enough for your app (which btw, rarely true for
> > java.net.URL, where it's just 5 connections). Happy pooling!
> >
> > 04 марта 2017 г. 12:25 пользователь "Yago Riveiro" <yago.rive...@gmail.com
> > написал:
> >
> > > Hi Mikhail,
> > >
> > > I’m not using SSL, and the way I call Solr is through a php script that
> > > use Curl
> > >
> > > --
> > >
> > > /Yago Riveiro
> > >
> > > On 4 Mar 2017 08:54 +, Mikhail Khludnev <gge...@gmail.com>, wrote:
> > > > Hello, Yago.
> > > > It usually happens when client doesn't reuse http connections. How do 
> > > > you
> > > > call Solr? Is there SSL?
> > > >
> > > > 04 марта 2017 г. 3:33 пользователь "Yago Riveiro" <
> > > yago.rive...@gmail.com
> > > > написал:
> > > >
> > > > > Hello,
> > > > >
> > > > > I have this log in my dmesg: possible SYN flooding on port 8983.
> > > Sending
> > > > > cookies.
> > > > >
> > > > > The Solr instance (6.3.0) has not accepting more http connections.
> > > > >
> > > > > I ran this: _lsof -nPi |grep \:8983 | wc -l_ and the number of
> > > connection
> > > > > to
> > > > > port 8983 is about 14K in CLOSE_WAIT ou ESTABLISHED state.
> > > > >
> > > > > Any suggestion of what could be the reason?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > /Yago
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > -
> > > > > Best regards
> > > > >
> > > > > /Yago
> > > > > --
> > > > > View this message in context: http://lucene.472066.n3.
> > > > > nabble.com/Solr-6-3-0-possible-SYN-flooding-on-port-
> > > 8983-Sending-cookies-
> > > > > tp4323341.html
> > > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > > > >
> > >
>


Re: Solr 6.3.0, possible SYN flooding on port 8983. Sending cookies.

2017-03-04 Thread Yago Riveiro
The weird thing is that the lsof command shows that connections are made 
between 2 solr instances and not from the origin of new income data ...

--

/Yago Riveiro

On 4 Mar 2017 10:32 +, Mikhail Khludnev <m...@apache.org>, wrote:
> I hardly can comment regarding PHP. But if you call curl as an external
> program it.s a dead end. However, giving
> http://stackoverflow.com/questions/972925/persistent-keepalive-http-with-the-php-curl-library
> you can reuse a 'context' across curl library calls and make sure that's
> keep-alive pool is large enough for your app (which btw, rarely true for
> java.net.URL, where it's just 5 connections). Happy pooling!
>
> 04 марта 2017 г. 12:25 пользователь "Yago Riveiro" <yago.rive...@gmail.com
> написал:
>
> > Hi Mikhail,
> >
> > I’m not using SSL, and the way I call Solr is through a php script that
> > use Curl
> >
> > --
> >
> > /Yago Riveiro
> >
> > On 4 Mar 2017 08:54 +, Mikhail Khludnev <gge...@gmail.com>, wrote:
> > > Hello, Yago.
> > > It usually happens when client doesn't reuse http connections. How do you
> > > call Solr? Is there SSL?
> > >
> > > 04 марта 2017 г. 3:33 пользователь "Yago Riveiro" <
> > yago.rive...@gmail.com
> > > написал:
> > >
> > > > Hello,
> > > >
> > > > I have this log in my dmesg: possible SYN flooding on port 8983.
> > Sending
> > > > cookies.
> > > >
> > > > The Solr instance (6.3.0) has not accepting more http connections.
> > > >
> > > > I ran this: _lsof -nPi |grep \:8983 | wc -l_ and the number of
> > connection
> > > > to
> > > > port 8983 is about 14K in CLOSE_WAIT ou ESTABLISHED state.
> > > >
> > > > Any suggestion of what could be the reason?
> > > >
> > > > Thanks,
> > > >
> > > > /Yago
> > > >
> > > >
> > > >
> > > >
> > > > -
> > > > Best regards
> > > >
> > > > /Yago
> > > > --
> > > > View this message in context: http://lucene.472066.n3.
> > > > nabble.com/Solr-6-3-0-possible-SYN-flooding-on-port-
> > 8983-Sending-cookies-
> > > > tp4323341.html
> > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > > >
> >


Re: Solr 6.3.0, possible SYN flooding on port 8983. Sending cookies.

2017-03-04 Thread Yago Riveiro
Hi Mikhail,

I’m not using SSL, and the way I call Solr is through a php script that use Curl

--

/Yago Riveiro

On 4 Mar 2017 08:54 +, Mikhail Khludnev <gge...@gmail.com>, wrote:
> Hello, Yago.
> It usually happens when client doesn't reuse http connections. How do you
> call Solr? Is there SSL?
>
> 04 марта 2017 г. 3:33 пользователь "Yago Riveiro" <yago.rive...@gmail.com
> написал:
>
> > Hello,
> >
> > I have this log in my dmesg: possible SYN flooding on port 8983. Sending
> > cookies.
> >
> > The Solr instance (6.3.0) has not accepting more http connections.
> >
> > I ran this: _lsof -nPi |grep \:8983 | wc -l_ and the number of connection
> > to
> > port 8983 is about 14K in CLOSE_WAIT ou ESTABLISHED state.
> >
> > Any suggestion of what could be the reason?
> >
> > Thanks,
> >
> > /Yago
> >
> >
> >
> >
> > -
> > Best regards
> >
> > /Yago
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Solr-6-3-0-possible-SYN-flooding-on-port-8983-Sending-cookies-
> > tp4323341.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >


Solr 6.3.0, possible SYN flooding on port 8983. Sending cookies.

2017-03-03 Thread Yago Riveiro
Hello, 

I have this log in my dmesg: possible SYN flooding on port 8983. Sending
cookies.

The Solr instance (6.3.0) has not accepting more http connections.

I ran this: _lsof -nPi |grep \:8983 | wc -l_ and the number of connection to
port 8983 is about 14K in CLOSE_WAIT ou ESTABLISHED state.

Any suggestion of what could be the reason?

Thanks,

/Yago




-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-3-0-possible-SYN-flooding-on-port-8983-Sending-cookies-tp4323341.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [Benchmark SOLR] JETTY VS TOMCAT

2017-01-27 Thread Yago Riveiro
Solr run tests with jetty.

I ran in nasty bugs in solr in the past with tomcat.

My advise it’s that speed is only one more metric, robustness and reliability 
matter too.

--

/Yago Riveiro

On 27 Jan 2017 15:38 +, William Bell <billnb...@gmail.com>, wrote:
> Did you try:
>
> Set your acceptor count, SelectChannelConnector.setAcceptors(int)
> <http://download.eclipse.org/jetty/stable-7/apidocs/org/eclipse/jetty/server/AbstractConnector.html#setAcceptors%28int%29
> to
> be a a value between 1 and (number_of_cpu_cores - 1).
>
> On Fri, Jan 27, 2017 at 3:22 AM, Gerald Reinhart <gerald.reinh...@kelkoo.com
> > wrote:
>
> > Hello,
> >
> > We are migrating our platform
> > from
> > - Solr 5.4.1 hosted by a Tomcat
> > to
> > - Solr 5.4.1 standalone (hosted by Jetty)
> >
> > => Jetty is 15% slower than Tomcat in the same conditions.
> >
> >
> > Here are details about the benchmarks :
> >
> > Context :
> > - Index with 9 000 000 documents
> > - Gatling launch queries extracted from the real traffic
> > - Server : R410 with 16 virtual CPU and 96G mem
> >
> > Results with 20 clients in // during 10 minutes:
> > For Tomcat :
> > - 165 Queries per seconds
> > - 120ms mean response time
> >
> > For Jetty :
> > - 139 Queries per seconds
> > - 142ms mean response time
> >
> > We have checked :
> > - the load of the server => same
> > - the io wait => same
> > - the memory used in the JVM => same
> > - JVM GC settings => same
> >
> > For us, it's a blocker for the migration.
> >
> > Is it a known issue ? (I found that :
> > http://www.asjava.com/jetty/jetty-vs-tomcat-performance-comparison/)
> >
> > How can we improve the performance of Jetty ? (We have already
> > followed
> > http://www.eclipse.org/jetty/documentation/9.2.21.v20170120/
> > optimizing.html
> > recommendation)
> >
> > Many thanks,
> >
> >
> > Gérald Reinhart
> >
> >
> > Kelkoo SAS
> > Société par Actions Simplifiée
> > Au capital de € 4.168.964,30
> > Siège social : 158 Ter Rue du Temple 75003 Paris
> > 425 093 069 RCS Paris
> >
> > Ce message et les pièces jointes sont confidentiels et établis à
> > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> > destinataire de ce message, merci de le détruire et d'en avertir
> > l'expéditeur.
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076


Re: Streams return default values for fields that doesn't exist in the document

2017-01-21 Thread Yago Riveiro
6.3.0

I will try again with 6.4.0

Thank Erick

--

/Yago Riveiro

On 21 Jan 2017, 21:23 +, Erick Erickson <erickerick...@gmail.com>, wrote:
> What version of Solr? See: https://issues.apache.org/jira/browse/SOLR-9166
>
> Best,
> Erick
>
> On Sat, Jan 21, 2017 at 1:08 PM, Yago Riveiro <yago.rive...@gmail.com> wrote:
> > I'm trying to use the streaming API to reindex data from one collection to
> > another.
> >
> > I have a lot of dynamic fields on my documents and not every document has
> > the same fields, therefore, to fetch the list if fields that exists in the
> > collection, I need to run a luke query to fetch all of them.
> >
> > I ran the stream with the fl with all the fields returned by the luke query
> > an all documents returned has the same fields, so far so good.
> >
> > The main problem is that the field doesn't exist in the returned document is
> > filled with the default value of the field type, not ok.
> >
> > If the field doesn't exist in the document the return value can't be the
> > default value, should be something that we can identify as "this field
> > doesn't exists in this document"
> >
> > Right now I have a docs with 2 integer fields with value 0, one indeed
> > belongs to the document and was indexed with 0 as the correct value, the
> > other doesn't exists in the source document.
> >
> > Why not return the value as null? when indexing a field with null value is
> > ignored, the reverse operation should returns the same ...
> >
> >
> >
> > -
> > Best regards
> >
> > /Yago
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Streams-return-default-values-for-fields-that-doesn-t-exist-in-the-document-tp4315229.html
> > Sent from the Solr - User mailing list archive at Nabble.com.


Streams return default values for fields that doesn't exist in the document

2017-01-21 Thread Yago Riveiro
I'm trying to use the streaming API to reindex data from one collection to
another.

I have a lot of dynamic fields on my documents and not every document has
the same fields, therefore, to fetch the list if fields that exists in the
collection, I need to run a luke query to fetch all of them.

I ran the stream with the fl with all the fields returned by the luke query
an all documents returned has the same fields, so far so good.

The main problem is that the field doesn't exist in the returned document is
filled with the default value of the field type, not ok. 

If the field doesn't exist in the document the return value can't be the
default value, should be something that we can identify as "this field
doesn't exists in this document"

Right now I have a docs with 2 integer fields with value 0, one indeed
belongs to the document and was indexed with 0 as the correct value, the
other doesn't exists in the source document.

Why not return the value as null? when indexing a field with null value is
ignored, the reverse operation should returns the same ...



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Streams-return-default-values-for-fields-that-doesn-t-exist-in-the-document-tp4315229.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: CloudSolrStream can't set the setZkClientTimeout and setZkConnectTimeout properties

2017-01-19 Thread Yago Riveiro
I can see some reconnects in my logs, the process of consuming the stream
doesn't broke and continue as normal.

The timeout is 10s but I can see in logs that after 6s the reconnect is
triggered, I don't know if it's the default behaviour or the zk timeout it's
not honoured.



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/CloudSolrStream-can-t-set-the-setZkClientTimeout-and-setZkConnectTimeout-properties-tp4313127p4314899.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
Ok, then I need to configure to reduce the size of the cache.

Thanks for the help Mikhail.

--

/Yago Riveiro

On 9 Jan 2017 17:01 +, Mikhail Khludnev <m...@apache.org>, wrote:
> This probably says why
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrConfig.java#L258
>
> On Mon, Jan 9, 2017 at 4:41 PM, Yago Riveiro <yago.rive...@gmail.com> wrote:
>
> > The documentation says that the only caches configurable are:
> >
> > - filterCache
> > - queryResultCache
> > - documentCache
> > - user defined caches
> >
> > There is no entry for fieldValueCache and in my case all of list in the
> > documentation are disable ...
> >
> > --
> >
> > /Yago Riveiro
> >
> > On 9 Jan 2017 13:20 +, Mikhail Khludnev <m...@apache.org>, wrote:
> > > On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro <yago.rive...@gmail.com
> > wrote:
> > >
> > > > Thanks for re reply Mikhail,
> > > >
> > > > Do you know if the 1 value is configurable?
> > >
> > > yes. in solrconfig.xml
> > > https://cwiki.apache.org/confluence/display/solr/Query+
> > Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
> > > iirc you cant' fully disable it setting size to 0.
> > >
> > >
> > > > My insert rate is so high
> > > > (5000 docs/s) that the cache it's quite useless.
> > > >
> > > > In the case of the Lucene field cache, it's possible "clean" it in some
> > > > way?
> > > >
> > > > Even it would be possible, the first sorting query or so loads it back.
> > >
> > > > Some cache is eating my memory heap.
> > > >
> > > Probably you need to dedicate master which won't load FieldCache.
> > >
> > >
> > > >
> > > >
> > > >
> > > > -
> > > > Best regards
> > > >
> > > > /Yago
> > > > --
> > > > View this message in context: http://lucene.472066.n3.
> > > > nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
> > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev


CloudSolrStream can't set the setZkClientTimeout and setZkConnectTimeout properties

2017-01-09 Thread Yago Riveiro
Hi,

Using the CloudSolrStream, is it possible define the setZkConnectTimeout and
setZkClientTimeout of internal CloudSolrClient?

The default negotiation timeout is set to 10 seconds.

Regards,

/Yago



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/CloudSolrStream-can-t-set-the-setZkClientTimeout-and-setZkConnectTimeout-properties-tp4313127.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Yago Riveiro
You can try to reindex your data to another collection with more shards

--

/Yago Riveiro

On 9 Jan 2017 14:15 +, Narsimha Reddy CHALLA <chnredd...@gmail.com>, wrote:
> No, it does not work by splitting. First of all lucene index files are not
> text files. There is a segment_NN file which will refer index files in a
> commit. So, when we split a large index file into smaller ones, the
> corresponding segment_NN file also needs to be updated with new index files
> OR a new segment_NN file should be created, probably.
>
> Can someone who is familiar with lucene index files please help us in this
> regard?
>
> Thanks
> NRC
>
> On Mon, Jan 9, 2017 at 7:38 PM, Manan Sheth <manan.sh...@impetus.co.in
> wrote:
>
> > Is this really works for lucene index files?
> >
> > Thanks,
> > Manan Sheth
> > 
> > From: Moenieb Davids <moenieb.dav...@gpaa.gov.za
> > Sent: Monday, January 9, 2017 7:36 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: Help needed in breaking large index file into smaller ones
> >
> > Hi,
> >
> > Try split on linux or unix
> >
> > split -l 100 originalfile.csv
> > this will split a file into 100 lines each
> >
> > see other options for how to split like size
> >
> >
> > -Original Message-
> > From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com]
> > Sent: 09 January 2017 12:12 PM
> > To: solr-user@lucene.apache.org
> > Subject: Help needed in breaking large index file into smaller ones
> >
> > Hi All,
> >
> > My solr server has a few large index files (say ~10G). I am looking
> > for some help on breaking them it into smaller ones (each < 4G) to satisfy
> > my application requirements. Are there any such tools available?
> >
> > Appreciate your help.
> >
> > Thanks
> > NRC
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 
> > ===
> > GPAA e-mail Disclaimers and confidential note
> >
> > This e-mail is intended for the exclusive use of the addressee only.
> > If you are not the intended recipient, you should not use the contents
> > or disclose them to any other person. Please notify the sender immediately
> > and delete the e-mail. This e-mail is not intended nor
> > shall it be taken to create any legal relations, contractual or otherwise.
> > Legally binding obligations can only arise for the GPAA by means of
> > a written instrument signed by an authorised signatory.
> > 
> > ===
> >
> > 
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> > proprietary, privileged or otherwise protected by law. The message is
> > intended solely for the named addressee. If received in error, please
> > destroy and notify the sender. Any use of this email is prohibited when
> > received in error. Impetus does not represent, warrant and/or guarantee,
> > that the integrity of this communication has been maintained nor that the
> > communication is free of errors, virus, interception or interference.
> >


Re: Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
The documentation says that the only caches configurable are:

- filterCache
- queryResultCache
- documentCache
- user defined caches

There is no entry for fieldValueCache and in my case all of list in the 
documentation are disable ...

--

/Yago Riveiro

On 9 Jan 2017 13:20 +, Mikhail Khludnev <m...@apache.org>, wrote:
> On Mon, Jan 9, 2017 at 2:17 PM, Yago Riveiro <yago.rive...@gmail.com> wrote:
>
> > Thanks for re reply Mikhail,
> >
> > Do you know if the 1 value is configurable?
>
> yes. in solrconfig.xml
> https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrConfig-Caches
> iirc you cant' fully disable it setting size to 0.
>
>
> > My insert rate is so high
> > (5000 docs/s) that the cache it's quite useless.
> >
> > In the case of the Lucene field cache, it's possible "clean" it in some
> > way?
> >
> > Even it would be possible, the first sorting query or so loads it back.
>
> > Some cache is eating my memory heap.
> >
> Probably you need to dedicate master which won't load FieldCache.
>
>
> >
> >
> >
> > -
> > Best regards
> >
> > /Yago
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev


Re: Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
Thanks for re reply Mikhail,

Do you know if the 1 value is configurable? My insert rate is so high
(5000 docs/s) that the cache it's quite useless.

In the case of the Lucene field cache, it's possible "clean" it in some way?

Some cache is eating my memory heap.



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-Lucene-FieldCache-tp4313062p4313069.html
Sent from the Solr - User mailing list archive at Nabble.com.


Question about Lucene FieldCache

2017-01-09 Thread Yago Riveiro
Hi,

After some reading into the documentation, supposedly the Lucene FieldCache
is the only one that it's not possible to disable.

Fetching the config for a collection through the REST API I found an entry
like this:

"query": {
"useFilterForSortedQuery": true,
"queryResultWindowSize": 1,
"queryResultMaxDocsCached": 0,
"enableLazyFieldLoading": true,
"maxBooleanClauses": 8192,
"": {
"size": "1",
"showItems": "-1",
"initialSize": "10",
"name": "fieldValueCache"
}
},

My questions:

- That size, 1 is for all files of the collection schema or is 1 for
each field defined?
- If I reload the collection the caches are wiped?

Regards,

/Yago



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-Lucene-FieldCache-tp4313062.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: CloudSolrStream client doesn't validate sort order

2017-01-07 Thread Yago Riveiro
Ok, good to know :)



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/CloudSolrStream-client-doesn-t-validate-sort-order-tp4312936p4312943.html
Sent from the Solr - User mailing list archive at Nabble.com.


CloudSolrStream client doesn't validate sort order

2017-01-07 Thread Yago Riveiro
Hi,

The CloudSolrStream client (Solr 6.3.0) assumes that the sort param always
have the order.

starting in line 326:

String[] sorts = sort.split(",");
StreamComparator[] comps = new StreamComparator[sorts.length];
for(int i=0; i

Re: Boolean type supports docValues?

2017-01-03 Thread Yago Riveiro
Reading the actual documentation is not clear ...

After test it, 6.3.0 indeed have docValues support for boolean type.

Thanks Erick.

--

/Yago Riveiro

On 3 Jan 2017 10:39 +, Yago Riveiro <yago.rive...@gmail.com>, wrote:
> Hi,
>
> The boolean type has support for DocValues? the documentation says that only
> StrField, UUIDField and Trie* numeric fields have support ( doc
> <https://cwiki.apache.org/confluence/display/solr/DocValues> ) but I found
> this Jira issue SOLR-9187 <https://issues.apache.org/jira/browse/SOLR-9187
> that supposedly implements docValues for boolean type.
>
> The base configs provided with Solr also have not configure the docValues.
>
> Regards,
>
> /Yago
>
>
>
> -
> Best regards
>
> /Yago
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Boolean-type-supports-docValues-tp4312004.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Boolean type supports docValues?

2017-01-03 Thread Yago Riveiro
Hi,

The boolean type has support for DocValues? the documentation says that only
StrField, UUIDField and Trie* numeric fields have support ( doc
  ) but I found
this Jira issue  SOLR-9187   
that supposedly implements docValues for boolean type.

The base configs provided with Solr also have not configure the docValues.

Regards,

/Yago



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boolean-type-supports-docValues-tp4312004.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cannot talk to ZooKeeper - Updates are disabled (Solr 6.3.0)

2016-12-29 Thread Yago Riveiro
If I lost quorum on Zookeeper, this is a “fault” in the Zookeeper cluster, 
therefore I should see something in the logs right?

The question here is, why I need to restart the node again?, if Zookeeper 
recover its quorum, the Solr node should be in read-write mode again …

Any ideas how can test if I’m lost the Zookeeper quorum?

--

/Yago Riveiro

On 29 Dec 2016 16:07 +, Susheel Kumar <susheel2...@gmail.com>, wrote:
> I believe this comes when Zookeeper quorum is not maintained. Do not see
> any way around except bringing the quorum back?
>
> Thanks,
> Susheel
>
> On Thu, Dec 29, 2016 at 9:27 AM, Yago Riveiro <yago.rive...@gmail.com
> wrote:
>
> > There is any way to recover from a exception
> > "org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
> > are disabled" without restart the affected node node?
> >
> > Regards,
> > /Yago
> >
> >
> >
> > -
> > Best regards
> >
> > /Yago
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Cannot-talk-to-ZooKeeper-Updates-are-
> > disabled-Solr-6-3-0-tp4311582.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >


Cannot talk to ZooKeeper - Updates are disabled (Solr 6.3.0)

2016-12-29 Thread Yago Riveiro
There is any way to recover from a exception
"org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
are disabled" without restart the affected node node?

Regards,
/Yago



-
Best regards

/Yago
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cannot-talk-to-ZooKeeper-Updates-are-disabled-Solr-6-3-0-tp4311582.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-27 Thread Yago Riveiro
bq: "That is really a job for streaming, not simple faceting.”

True, it’s the next step to improve our performance (right now we are using 
JSON facets), and 6.3.0 has a lot of useful tools to work with streaming 
expressions. Our last release before 6.3 was 5.3.1 and the streaming 
expressions were buggy in some scenarios.

bq: "Okay. You could create a new collection with the wanted amount of shards 
and do a full re-index into that.”

True, you are right but we are trying to avoid that (this point falls into 
“keep management low”).

Solr it’s a amazing tool, with a lack of auto magic management stuff. You have 
all the power and therefore all the work :p

Following your advices I will try to review the topology of my collection and 
try to point the oversharded collections.

--

/Yago Riveiro

On 27 Dec 2016 21:54 +, Toke Eskildsen <t...@statsbiblioteket.dk>, wrote:
> Yago Riveiro <yago.rive...@gmail.com> wrote:
> > One thing that I forget to mention is that my clients can aggregate
> > by any field in the schema with limit=-1, this is not a problem with
> > 99% of the fields, but 2 or 3 of them are URLs. URLs has very
> > high cardinality and one of the reasons to sharding collections is
> > to lower the memory footprint to not blow the node and do the
> > last merge in a big machine.
>
> That is really a job for streaming, not simple faceting.
>
> Even if you insist on faceting, the problem remains that your merger needs to 
> be powerful enough to process the full result set. Using that machine with a 
> single shard collection instead would eliminate the excessive overhead of 
> doing distributed faceting on millions of values, sparing a lot of hardware 
> allocation, which could be used to beef up the single-shard hardware even 
> more.
>
> [Toke: You can always split later]
>
> > Every time I run the SPLITSHARD command, the command fails
> > in a different way. IMHO right now Solr doesn’t have an efficient
> > way to rebalance collection’s shard.
>
> Okay. You coul create a new collection with the wanted amount of shards and 
> do a full re-index into that.
>
> [Toke: "And yes, more logistics on your part as one size no longer fits all”]
>
> > The key point of this deploy is reduce the amount of management
> > as much as possible,
>
> That is your prerogative. I hope my suggestions can be used by other people 
> with similar challenges then.
>
> - Toke Eskildsen


Re: Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-27 Thread Yago Riveiro
One thing that I forget to mention is that my clients can aggregate by any 
field in the schema with limit=-1, this is not a problem with 99% of the 
fields, but 2 or 3 of them are URLs. URLs has very high cardinality and one of 
the reasons to sharding collections is to lower the memory footprint to not 
blow the node and do the last merge in a big machine.

"Should a collection grow past whatever threshold you determine, you can always 
split it.”

Every time I run the SPLITSHARD command, the command fails in a different way. 
IMHO right now Solr doesn’t have an efficient way to rebalance collection’s 
shard.

"And yes, more logistics on your part as one size no longer fits all”

The key point of this deploy is reduce the amount of management as much as 
possible, Solr improved the management of the cluster a lot in comparison with 
4.x release. Even so, remains difficult manage a big cluster without custom 
tools.

Solr continues to improve with each version, and I saw issues with a lot of 
nice stuff like SOLR-9735 and SOLR-9241

--

/Yago Riveiro

On 26 Dec 2016 22:10 +, Toke Eskildsen <t...@statsbiblioteket.dk>, wrote:
> Yago Riveiro <yago.rive...@gmail.com> wrtoe:
> > My cluster holds more than 10B documents stored in 15T.
> >
> > The size of my collections is variable but I have collections with 800M
> > documents distributed over the 12 nodes, the amount of documents per shard
> > is ~66M and indeed the performance is good.
>
> The math supports Erick's point about over-sharding. On average you have:
> 15 TB/ 1200 collections / 12 shards ~= 1GB / shard.
> 10B docs / 1200 collections / 12 shards ~= 700K docs/shard
>
> While your 12 shards fits well with your large collections, such as the one 
> you described above, they are a very poor match for your average collection. 
> Assuming your collections behave roughly the same way as each other, your 
> average and smaller than average collections would be much better off with 
> just 1 shard (and 2 replicas). That eliminates the overhead of distributed 
> search-requests (for that collection) and lowers your overall shard-count 
> significantly. Should a collection grow past whatever threshold you 
> determine, you can always split it.
>
> Better performance, lower hardware requirements, more manageable shard 
> amount. And yes, more logistics on your part as one size no longer fits all.
>
> - Toke Eskildsen


Bad version writing to ZK in 6.3.0

2016-12-26 Thread Yago Riveiro
Lately I can read this warning in my logs some times:

Bad version writing to ZK using compare-and-set, will force refresh cluster
state: KeeperErrorCode = BadVersion for /collections/X/state.json

Why this happen? it's normal?

--

/Yago



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Bad-version-writing-to-ZK-in-6-3-0-tp4311204.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-26 Thread Yago Riveiro
My cluster holds more than 10B documents stored in 15T.

The size of my collections is variable but I have collections with 800M
documents distributed over the 12 nodes, the amount of documents per shard
is ~66M and indeed the performance is good.

I need the collections to isolate the data of my clients and for scalability
reasons. Isolate data in collections give the power to allocate the data in
new machines in a easy way, or promote my clients to better hardware.

In a situation like that fast restarts are critical to ensure availability
and to recover from situations where 2 or more nodes goes down.




-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Uncaught-exception-java-lang-StackOverflowError-in-6-3-0-tp4309849p4311200.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting Error - Session expired for /collections/sprod/state.json

2016-12-16 Thread Yago Riveiro
Do some gc profiling to get some information about. It's possible you have 
configure a small heap and you are running in gc stop the world issues.

Normally zookeeper erros are bounded to gc and network latency issues

--

/Yago Riveiro

On 16 Dec 2016, 09:49 +, Piyush Kunal <piyush.ku...@myntra.com>, wrote:
> Looks like an issue with 6.x version then.
> But this seems too basic. Not sure if community would not have caught this
> till now.
>
> On Fri, Dec 16, 2016 at 2:55 PM, Yago Riveiro <yago.rive...@gmail.com
> wrote:
>
> > I had some of this error in my logs too on 6.3.0
> >
> > My cluster also index like 20K docs/sec I don't know why.
> >
> > --
> >
> > /Yago Riveiro
> >
> > On 16 Dec 2016, 08:39 +, Piyush Kunal <piyush.ku...@myntra.com>,
> > wrote:
> > > Anyone has noticed such issue before?
> > >
> > > On Thu, Dec 15, 2016 at 4:36 PM, Piyush Kunal <piyush.ku...@myntra.com
> > > wrote:
> > >
> > > > This is happening when heavy indexing like 100/second is going on.
> > > >
> > > > On Thu, Dec 15, 2016 at 4:33 PM, Piyush Kunal <piyush.ku...@myntra.com
> > > > wrote:
> > > >
> > > > > - We have solr6.1.0 cluster running on production with 1 shard and 5
> > > > > replicas.
> > > > > - Zookeeper quorum on 3 nodes.
> > > > > - Using a chroot in zookeeper to segregate the configs from other
> > > > > collections.
> > > > > - Using solrj5.1.0 as our client to query solr.
> > > > >
> > > > >
> > > > >
> > > > > Usually things work fine but on and off we witness this exception
> > coming
> > > > > up:
> > > > > =
> > > > > org.apache.solr.common.SolrException: Could not load collection from
> > > > > ZK:sprod
> > > > > at org.apache.solr.common.cloud.ZkStateReader.getCollectionLive
> > > > > (ZkStateReader.java:815)
> > > > > at org.apache.solr.common.cloud.ZkStateReader$5.get(ZkStateRead
> > > > > er.java:477)
> > > > > at org.apache.solr.client.solrj.impl.CloudSolrClient.getDocColl
> > > > > ection(CloudSolrClient.java:1174)
> > > > > at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWit
> > > > > hRetryOnStaleState(CloudSolrClient.java:807)
> > > > > at org.apache.solr.client.solrj.impl.CloudSolrClient.request(Cl
> > > > > oudSolrClient.java:782)
> > > > > --
> > > > > Caused by: org.apache.zookeeper.KeeperException$
> > SessionExpiredException:
> > > > > KeeperErrorCode = Session expired for /collections/sprod/state.json
> > > > > at org.apache.zookeeper.KeeperException.create(KeeperException.
> > > > > java:127)
> > > > > at org.apache.zookeeper.KeeperException.create(KeeperException.
> > > > > java:51)
> > > > > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
> > > > > at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkCl
> > > > > ient.java:311)
> > > > > at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkCl
> > > > > ient.java:308)
> > > > > at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(Zk
> > > > > CmdExecutor.java:61)
> > > > > at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClien
> > > > > t.java:308)
> > > > > --
> > > > > org.apache.solr.common.SolrException: Could not load collection from
> > > > > ZK:sprod
> > > > > at org.apache.solr.common.cloud.ZkStateReader.getCollectionLive
> > > > > (ZkStateReader.java:815)
> > > > > at org.apache.solr.common.cloud.ZkStateReader$5.get(ZkStateRead
> > > > > er.java:477)
> > > > > at org.apache.solr.client.solrj.impl.CloudSolrClient.getDocColl
> > > > > ection(CloudSolrClient.java:1174)
> > > > > at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWit
> > > > > hRetryOnStaleState(CloudSolrClient.java:807)
> > > > > at org.apache.solr.client.solrj.impl.CloudSolrClient.request(Cl
> > > > > oudSolrClient.java:782)
> > > > > --
> > > > > Caused by: org.apache.zookeeper.KeeperException$
> > SessionExpiredException:
> > > > > KeeperErrorCode = Session expired fo

Re: Getting Error - Session expired for /collections/sprod/state.json

2016-12-16 Thread Yago Riveiro
I had some of this error in my logs too on 6.3.0

My cluster also index like 20K docs/sec I don't know why.

--

/Yago Riveiro

On 16 Dec 2016, 08:39 +, Piyush Kunal <piyush.ku...@myntra.com>, wrote:
> Anyone has noticed such issue before?
>
> On Thu, Dec 15, 2016 at 4:36 PM, Piyush Kunal <piyush.ku...@myntra.com
> wrote:
>
> > This is happening when heavy indexing like 100/second is going on.
> >
> > On Thu, Dec 15, 2016 at 4:33 PM, Piyush Kunal <piyush.ku...@myntra.com
> > wrote:
> >
> > > - We have solr6.1.0 cluster running on production with 1 shard and 5
> > > replicas.
> > > - Zookeeper quorum on 3 nodes.
> > > - Using a chroot in zookeeper to segregate the configs from other
> > > collections.
> > > - Using solrj5.1.0 as our client to query solr.
> > >
> > >
> > >
> > > Usually things work fine but on and off we witness this exception coming
> > > up:
> > > =
> > > org.apache.solr.common.SolrException: Could not load collection from
> > > ZK:sprod
> > > at org.apache.solr.common.cloud.ZkStateReader.getCollectionLive
> > > (ZkStateReader.java:815)
> > > at org.apache.solr.common.cloud.ZkStateReader$5.get(ZkStateRead
> > > er.java:477)
> > > at org.apache.solr.client.solrj.impl.CloudSolrClient.getDocColl
> > > ection(CloudSolrClient.java:1174)
> > > at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWit
> > > hRetryOnStaleState(CloudSolrClient.java:807)
> > > at org.apache.solr.client.solrj.impl.CloudSolrClient.request(Cl
> > > oudSolrClient.java:782)
> > > --
> > > Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > KeeperErrorCode = Session expired for /collections/sprod/state.json
> > > at org.apache.zookeeper.KeeperException.create(KeeperException.
> > > java:127)
> > > at org.apache.zookeeper.KeeperException.create(KeeperException.
> > > java:51)
> > > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
> > > at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkCl
> > > ient.java:311)
> > > at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkCl
> > > ient.java:308)
> > > at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(Zk
> > > CmdExecutor.java:61)
> > > at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClien
> > > t.java:308)
> > > --
> > > org.apache.solr.common.SolrException: Could not load collection from
> > > ZK:sprod
> > > at org.apache.solr.common.cloud.ZkStateReader.getCollectionLive
> > > (ZkStateReader.java:815)
> > > at org.apache.solr.common.cloud.ZkStateReader$5.get(ZkStateRead
> > > er.java:477)
> > > at org.apache.solr.client.solrj.impl.CloudSolrClient.getDocColl
> > > ection(CloudSolrClient.java:1174)
> > > at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWit
> > > hRetryOnStaleState(CloudSolrClient.java:807)
> > > at org.apache.solr.client.solrj.impl.CloudSolrClient.request(Cl
> > > oudSolrClient.java:782)
> > > --
> > > Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > KeeperErrorCode = Session expired for /collections/sprod/state.json
> > > at org.apache.zookeeper.KeeperException.create(KeeperException.
> > > java:127)
> > > at org.apache.zookeeper.KeeperException.create(KeeperException.
> > > java:51)
> > > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
> > > at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkCl
> > > ient.java:311)
> > > at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkCl
> > > ient.java:308)
> > > at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(Zk
> > > CmdExecutor.java:61)
> > > at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClien
> > > t.java:308)
> > > =
> > >
> > >
> > >
> > >
> > >
> > > This is our zoo.cfg:
> > > ==
> > > tickTime=2000
> > > dataDir=/var/lib/zookeeper
> > > clientPort=2181
> > > initLimit=5
> > > syncLimit=2
> > > server.1=192.168.70.27:2888:3888
> > > server.2=192.168.70.64:2889:3889
> > > server.3=192.168.70.26:2889:3889
&g

Re: Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-15 Thread Yago Riveiro
Yes, I changed the value of coreLoadThreads.

With the default value a node takes like 40 minutes to be available with all 
replicas up.

Right now I have ~1.2K collections with 12 shards each, 2 replicas spread in 12 
nodes. Indeed the value I configured maybe is too much (2048) but I can start 
nodes in 10 minutes.

I need to review the value to something more conservative maybe.

--

/Yago Riveiro

On 15 Dec 2016, 16:43 +, Erick Erickson <erickerick...@gmail.com>, wrote:
> Hmmm, have you changed coreLoadThreads? We had a problem with this a
> while back with loading lots and lots of cores, see:
> https://issues.apache.org/jira/browse/SOLR-7280
>
> But that was fixed in 6.2, so unless you changed the number of threads
> used to load cores it shouldn't be a problem on 6.3...
>
> The symptom was also that replicas would never change to "active",
> they'd be stuck in ercovery or down.
>
> Best,
> Erick
>
> On Thu, Dec 15, 2016 at 3:07 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:
> > Hi,
> >
> > I'm getting this error in my log
> >
> > 12/15/2016, 9:28:18 AM ERROR true ExecutorUtil Uncaught exception
> > java.lang.StackOverflowError thrown by thread:
> > coreZkRegister-1-thread-48-processing-n:XXX.XXX.XXX.XXX:8983_solr
> > x:collection1_shard3_replica2 s:shard3 c:collection1-visitors r:core_node5
> > java.lang.Exception: Submitter stack trace
> > at
> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:204)
> > at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:204)
> > at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:505)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at
> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> > -
> > Best regards
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Uncaught-exception-java-lang-StackOverflowError-in-6-3-0-tp4309849.html
> > Sent from the Solr - User mailing list archive at Nabble.com.


Uncaught exception java.lang.StackOverflowError in 6.3.0

2016-12-15 Thread Yago Riveiro
Hi,

I'm getting this error in my log

12/15/2016, 9:28:18 AM  ERROR true  ExecutorUtilUncaught 
exception
java.lang.StackOverflowError thrown by thread:
coreZkRegister-1-thread-48-processing-n:XXX.XXX.XXX.XXX:8983_solr
x:collection1_shard3_replica2 s:shard3 c:collection1-visitors r:core_node5
java.lang.Exception: Submitter stack trace
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:204)
at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:204)
at 
org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:505)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Uncaught-exception-java-lang-StackOverflowError-in-6-3-0-tp4309849.html
Sent from the Solr - User mailing list archive at Nabble.com.


Zookeeper connection lost in 5.5.3

2016-11-28 Thread Yago Riveiro
Hi, 

I upgraded my cluster to 5.5.3 and now I'm having a lot of this warnings.

Unable to read
/collections/collectionX/leader_initiated_recovery/shard9/core_node12 due
to: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/collections/collectionX/leader_initiated_recovery/shard9/core_node12

Also one node lost connection with zookeeper and was ejected from the
cluster.

Any clue about how I can debug this? 



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Zookeeper-connection-lost-in-5-5-3-tp4307804.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to enable JMX to monitor Jetty

2016-11-28 Thread Yago Riveiro
Hi,

Rallavagu, the jetty-jmx.xml file is the basic file of the github repository
or something custom?

I modified the file modules/http.mod and I can't see jetty stuff ...



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-enable-JMX-to-monitor-Jetty-tp4278246p4307802.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Load core process changed between 5.5.3 and 5.3.1

2016-11-20 Thread Yago Riveiro
Indeed in 5.3.1 the CPU spikes to 80 of load and now the cluster is more
stable, slower but more stable.  
  
Thanks.  
  
\--

  

/Yago Riveiro

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/local-
a46120b4-69a7?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)

  
On Nov 20 2016, at 4:31 pm, Erick Erickson <erickerick...@gmail.com> wrote:  

> see: https://issues.apache.org/jira/browse/SOLR-7280

>

> The problem is that when the number of load threads is unbounded and  
you have lots of cores, you can get into a state where replicas don't  
come up because of OOM errors and getting them back up is  
hard/impossible. Plus an OOM error is scary as the state of your  
system is questionable.

>

> You can adjust the number of threads, see the ref guide for  
"coreLoadThreads" in the  element of solr.xml. This the current  
ref guide, but it's the same as 5.5:  
https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml

>

> Best,  
Erick

>

>  
On Sun, Nov 20, 2016 at 7:46 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:  
> Hi,  
>  
> I'm trying to upgrade my cluster from Solr version 5.3.1. to 5.5.3 and I  
> noticed that the core loading process in 5.5.3 is different from 5.3.1.  
>  
> The number of core loaded in parallel in 5.5.3 are about 5 or 6, when in  
> 5.3.1 all cores were published as state "recovering" all together.  
>  
> This is the new behaviour or something is wrong with my setup?  
>  
> Reload a node with 5.5.3 is slower compared with 5.3.1.  
>  
>  
>  
> \-  
> Best regards  
> \--  
> View this message in context: [http://lucene.472066.n3.nabble.com/Load-core-
process-changed-
between-5-5-3-and-5-3-1-tp4306588.html](http://lucene.472066.n3.nabble.com
/Load-core-process-changed-
between-5-5-3-and-5-3-1-tp4306588.html=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)  
> Sent from the Solr - User mailing list archive at Nabble.com.



Load core process changed between 5.5.3 and 5.3.1

2016-11-20 Thread Yago Riveiro
Hi,

I'm trying to upgrade my cluster from Solr version 5.3.1. to 5.5.3 and I
noticed that the core loading process in 5.5.3 is different from 5.3.1. 

The number of core loaded in parallel in 5.5.3 are about 5 or 6, when in
5.3.1 all cores were published as state "recovering" all together.

This is the new behaviour or something is wrong with my setup?

Reload a node with 5.5.3 is slower compared with 5.3.1.



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Load-core-process-changed-between-5-5-3-and-5-3-1-tp4306588.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Permission error using install_solr_service script.sh

2016-10-01 Thread Yago Riveiro
And yes, I executed the script as root using the /opt folder as install folder

--

/Yago Riveiro

On 1 Oct 2016, 15:18 +0100, Shawn Heisey <apa...@elyograg.org>, wrote:
> On 9/29/2016 3:42 AM, Yago Riveiro wrote:
> > I'm having troubles to run the install_solr_service in Centos 7.2.
> >
> > I have this errors:
> >
> > -bash: /opt/usr/solr/bin/solr: Permission denied
> > -bash: /opt/usr/solr/bin/solr: Permission denied
> >
> > the problematic line is the line 315 on install_solr_service.sh
> >
> > find "$SOLR_VAR_DIR" -type f -print0 | xargs -0 chmod 0640
> >
> > This line should set execution permissions to executable files, the
> > permissions should be something like 750 to allow the user and the user
> > group to run the scripts.
>
> The var dir (which is usually in /var) does not contain anything that
> needs the executable bit set. The scripts that need to be executable
> are in under the install dir and in etc/init.d/.
>
> Are you running the install script as root? It will need root
> privileges to do everything it must do. This looks like a case of the
> user running the script not having the permission to change permissions.
>
> Thanks,
> Shawn
>


Re: Permission error using install_solr_service script.sh

2016-10-01 Thread Yago Riveiro
I was running the install into a vm, I deleted it and did a new provisioning. 
Now it woks.

The weird thing is that before delete the vm I ran a chmod over that files and 
the install finished without that errors.

--

/Yago Riveiro

On 1 Oct 2016, 15:18 +0100, Shawn Heisey <apa...@elyograg.org>, wrote:
> On 9/29/2016 3:42 AM, Yago Riveiro wrote:
> > I'm having troubles to run the install_solr_service in Centos 7.2.
> >
> > I have this errors:
> >
> > -bash: /opt/usr/solr/bin/solr: Permission denied
> > -bash: /opt/usr/solr/bin/solr: Permission denied
> >
> > the problematic line is the line 315 on install_solr_service.sh
> >
> > find "$SOLR_VAR_DIR" -type f -print0 | xargs -0 chmod 0640
> >
> > This line should set execution permissions to executable files, the
> > permissions should be something like 750 to allow the user and the user
> > group to run the scripts.
>
> The var dir (which is usually in /var) does not contain anything that
> needs the executable bit set. The scripts that need to be executable
> are in under the install dir and in etc/init.d/.
>
> Are you running the install script as root? It will need root
> privileges to do everything it must do. This looks like a case of the
> user running the script not having the permission to change permissions.
>
> Thanks,
> Shawn
>


Permission error using install_solr_service script.sh

2016-09-29 Thread Yago Riveiro
Hi, 

I'm having troubles to run the install_solr_service in Centos 7.2.

I have this errors:

-bash: /opt/usr/solr/bin/solr: Permission denied
-bash: /opt/usr/solr/bin/solr: Permission denied

the problematic line is the line 315 on install_solr_service.sh

find "$SOLR_VAR_DIR" -type f -print0 | xargs -0 chmod 0640

This line should set execution permissions to executable files, the
permissions should be something like 750 to allow the user and the user
group to run the scripts. 



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Permission-error-using-install-solr-service-script-sh-tp4298580.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Whether SolrCloud can support 2 TB data?

2016-09-24 Thread Yago Riveiro
 "LucidWorks achieved 150k docs/second"

  

This is only valid is you don't have replication, I don't know your use case,
but a realistic use case normally use some type of redundancy to not lost data
in a hardware failure, at least 2 replicas, more implicates a reduction of
throughput. Also don't forget that in an realistic use case you should handle
reads too.  
  
Our cluster is small for the data we hold (12 machines with SSD and 32G of
RAM), but we don't need sub-second queries, we need facet with high
cardinality (in worst case scenarios we aggregate 5M unique string values)  
  
As Shawn probably told you, sizing your cluster is a try and error path. Our
cluster is optimize to handle a low rate of reads, facet queries and a high
rate of inserts.  
  
In a peak of inserts we can handle around 25K docs per second without any
issue with 2 replicas and without compromise reads or put a node in stress.
Nodes in stress can eject him selfs from the Zookepeer cluster due a GC or a
lack of CPU to communicate.  
  
If you want accuracy data you need to do test.  
  
Keep in mind the most important thing about solr in my opinion, in a terabyte
scale any field type schema change or LuceneCodec change will force you to do
a full reindex. Each time I need to update Solr to a major release it's a pain
in the ass to convert the segments if are not compatible with newer version.
This can take months, will not ensure your data will be equal that a clean
index (voodoo magic thing can happen, thrust me), and it will drain a huge
amount of hardware resources to do it without downtime.

  
\--

  

/Yago Riveiro

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/local-277ee09e-
1aee?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)

  
On Sep 24 2016, at 7:48 am, S G <sg.online.em...@gmail.com> wrote:  

> Hey Yago,

>

> 12 T is very impressive.

>

> Can you also share some numbers about the shards, replicas, machine  
count/specs and docs/second for your case?  
I think you would not be having a single index of 12 TB too. So some  
details on that would be really helpful too.

>

> https://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/  
is a good post how LucidWorks achieved 150k docs/second.  
If you have any such similar blog, that would be quite useful and popular  
too.

>

> \--SG

>

> On Fri, Sep 23, 2016 at 5:00 PM, Yago Riveiro <yago.rive...@gmail.com>  
wrote:

>

> > In my company we have a SolrCloud cluster with 12T.  
>  
> My advices:  
>  
> Be nice with CPU you will needed in some point (very important if you have  
> not control over the kind of queries to the cluster, clients are greedy,  
> the want all results at the same time)  
>  
> SSD and memory (as many as you can afford if you will do facets)  
>  
> Full recoveries are a pain, network it's important and should be as fast  
> as possible, never less than 1Gbit.  
>  
> Divide and conquer, but too much can drive you to an expensive overhead,  
> data travels over the network. Find the sweet point (only testing you use  
> case you will know)  
>  
> \--  
>  
> /Yago Riveiro  
>  
> On 23 Sep 2016, 23:44 +0100, Pushkar Raste <pushkar.ra...@gmail.com>,  
> wrote:  
> > Solr is RAM hungry. Make sure that you have enough RAM to have most if  
> the  
> > index of a core in the RAM itself.  
> >  
> > You should also consider using really good SSDs.  
> >  
> > That would be a good start. Like others said, test and verify your setup.  
> >  
> > \--Pushkar Raste  
> >  
> > On Sep 23, 2016 4:58 PM, "Jeffery Yuan" <yuanyun...@gmail.com> wrote:  
> >  
> > Thanks so much for your prompt reply.  
> >  
> > We are definitely going to use SolrCloud.  
> >  
> > I am just wondering whether SolrCloud can scale even at TB data level and  
> > what kind of hardware configuration it should be.  
> >  
> > Thanks.  
> >  
> >  
> >  
> > \--  
> > View this message in context: [http://lucene.472066.n3.](http://lucene.472
066.n3.=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)  
> > nabble.com/Whether-solr-can-support-2-TB-data-tp4297790p4297800.html  
> > Sent from the Solr - User mailing list archive at Nabble.com.  
>



Re: Whether SolrCloud can support 2 TB data?

2016-09-23 Thread Yago Riveiro
In my company we have a SolrCloud cluster with 12T.

My advices:

Be nice with CPU you will needed in some point (very important if you have not 
control over the kind of queries to the cluster, clients are greedy, the want 
all results at the same time)

SSD and memory (as many as you can afford if you will do facets)

Full recoveries are a pain, network it's important and should be as fast as 
possible, never less than 1Gbit.

Divide and conquer, but too much can drive you to an expensive overhead, data 
travels over the network. Find the sweet point (only testing you use case you 
will know)

--

/Yago Riveiro

On 23 Sep 2016, 23:44 +0100, Pushkar Raste <pushkar.ra...@gmail.com>, wrote:
> Solr is RAM hungry. Make sure that you have enough RAM to have most if the
> index of a core in the RAM itself.
>
> You should also consider using really good SSDs.
>
> That would be a good start. Like others said, test and verify your setup.
>
> --Pushkar Raste
>
> On Sep 23, 2016 4:58 PM, "Jeffery Yuan" <yuanyun...@gmail.com> wrote:
>
> Thanks so much for your prompt reply.
>
> We are definitely going to use SolrCloud.
>
> I am just wondering whether SolrCloud can scale even at TB data level and
> what kind of hardware configuration it should be.
>
> Thanks.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Whether-solr-can-support-2-TB-data-tp4297790p4297800.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Heap memory usage is -1 in UI

2016-09-23 Thread Yago Riveiro
This is happening in 5.3.1

This metric is interesting to know the minimal memory footprint of a core (data 
structures and caches).

I agree with Shawn that if Solr doesn't support the metric should be remove 
from the admin, but I insist in the fact that it's useful to plot memory 
consumption in services like zabbix.

--

/Yago Riveiro

On 23 Sep 2016, 01:08 +0100, Shawn Heisey <apa...@elyograg.org>, wrote:
> On 9/22/2016 4:59 PM, Yago Riveiro wrote:
> > The Heap Memory Usage in the UI it's always -1. There is some way to
> > get the amount of heap that a core consumes?
>
> In all the versions that I have looked at, up to 6.0, this number is
> either entirely too small or -1.
>
> Looking into the code, this info comes from the /admin/luke handler, and
> that handler gets it from Lucene. The -1 appears to come into play when
> the reader object is not the expected type, so I'm guessing that past
> changes in Lucene require changes in Solr that have not yet happened.
> Even if the code is fixed so the reader object(s) are calculated
> correctly, that won't be enough information for a true picture of core
> memory usage.
>
> In order for this number to be accurate, size information from other
> places, such as Lucene caches and Solr caches, must also be included.
> There might also be memory structures involved that I haven't even
> thought of. It is entirely possible that the code to gather all this
> information does not yet exist.
>
> In my opinion, the Heap Memory statistic should be removed until a time
> when it can be overhauled so that it is as accurate as possible. Can
> you open an issue in Jira?
>
> Thanks,
> Shawn
>


Heap memory usage is -1 in UI

2016-09-22 Thread Yago Riveiro
The Heap Memory Usage in the UI it's always -1.

There is some way to get the amount of heap that a core consumes?



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Heap-memory-usage-is-1-in-UI-tp4297601.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Miserable Experience Using Solr. Again.

2016-09-13 Thread Yago Riveiro
I stuck in 5.3.1 because if upgrade to 5.5 or 6.x my cluster dies.  
  
Doing a rolling upgrade, when I upgrade the second node to 5.5 both die in the
per-sync phase, I don't know what changes in 5.5 but it's demanding a huge
quantity of memory to check if the replica it's in sync.  
  
This kind of stuff and the full re-index (12T)  between major releases are
indeed a pain.  
  
Cryptical errors and a deficient system to get metrics from what it's going on
inside the cluster is another issue, I'm unable to get the throughput in a
collection as a whole, the number of http connection in each node, the
utilization of the jetty thread pool and stuff like that.  
  
Solr is a great tool, but it's hard, too hard to get in.  
\--

  

/Yago Riveiro

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/local-
89046b47-a272?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)

  
On Sep 13 2016, at 10:46 am, Alessandro Benedetti <abenede...@apache.org>
wrote:  

> First of all I second Bram, I am sorry you had a bad experience with Solr,  
but I think that:  
\- without a minimum study and documentation  
\- without trying to follow the best practices  
I think you are going to have a "miserable" experience with any software,  
don't you ?

>

> In addition to Bram :

>

> On Mon, Sep 12, 2016 at 10:48 PM, Aaron Greenspan <  
aaron.greens...@plainsite.org> wrote:  
>  
> It didn’t say which field type. Buried in the logs I found a reference in  
> the Java stack trace—which *disappears* (and distorts the viewing window  
> horribly) after a few seconds when you try to view it in the web log UI—to  
> the string "units="degrees"".  
>

>

> This si a bug, and it is really annoying, not sure anyone already raised  
it, if not I suggest you to do that :)  
But you can use the logs themselves without any problem.

>

> >  
> Apparently there is some aspect of the Thai text field type that Solr  
> 6.2.0 doesn’t like. So I disabled it. I don’t use Thai text.  
>

>

> If you were not using the Thai text, why had you the Thai Text field type  
defined ?  
Keep It Simple Stupid is the way :)  
I find tons of Solr instances in production mith monster solrconfig.xml and  
schema.xml. basically the old default ones, without any particular reason.  
Don't do that !

>

> >  
> Now Solr was complaining about "Error loading class  
> 'solr.admin.AdminHandlers'". So I found the reference to  
> solr.admin.AdminHandlers in solrconfig.xml for each of my cores and  
> commented it out. Only then did Solr work again.  
>

>

> Seems to be you didn't take care of reading the update release notes, did  
you ?

>

>  
Cheers  
\--  
\--

>

> Benedetti Alessandro  
Visiting card : [http://about.me/alessandro_benedetti](http://about.me/alessan
dro_benedetti=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)

>

> "Tyger, tyger burning bright  
In the forests of the night,  
What immortal hand or eye  
Could frame thy fearful symmetry?"

>

> William Blake - Songs of Experience -1794 England



Unable to upgrade from 5.4 to 5.5.2

2016-08-12 Thread Yago Riveiro
I'm trying to upgrade my Solr cluster from 5.4 to 5.5.2 in a rolling restart
without success.  
  
The first node upgraded with 5.4 worked without any issue, the problem arise
with the second. When the second node is restarted with the 5.4 version, the
heap of both nodes grows until the first node (I don't why, but always is the
first node) hit an OOM.  
  
It's like something in the PeerSync process is consuming ram (my index is huge
I have replicas with 250G)  until hit the OOM.

  

https://issues.apache.org/jira/browse/SOLR-8586 was added in 5.5, this issue
changes the way how shard synchronization is done. Can this issue be related
with my problem?  

  

\--

  

/Yago Riveiro

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/local-2eefb82c-
7bf1?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)



Re: Idle timeout expired: 50000/50000 ms

2016-07-14 Thread Yago Riveiro
Recently I started to buffering docs and send them to Solr in blocks of 250
and 50 workers. But now I'm hitting this issue too with Solr 5.3.1

Googling a bit I found this
https://bugs.eclipse.org/bugs/show_bug.cgi?id=435322 that was fixes in 9.2
version.

There is another  link

  
that in comments people describes that the running out upgrading to 9.2



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Idle-timeout-expired-5-5-ms-tp4273515p4287219.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re: Can Solr 5.5 recognize the index result generated by SolrIndexer of old version Nutch ?

2016-06-01 Thread Yago Riveiro
I did the process from 4.0 to 4.10 (I have disk docValues in my index) with a
IndexUpgrader tool.  
  
Indeed, I don't know if from 1.4 to 4.10 this process works ...  
  
But googling a bit I found this  http://stackoverflow.com/questions/25450865
/migrate-solr-1-4-index-files-to-4-7  
  
Is like Erick said, you will need to do this process in several steps before
reach 5.x  
  
  
\--

  

/Yago Riveiro

  
![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/local-1f481cc8-d5e2)

On Jun 1 2016, at 5:22 pm, Erick Erickson erickerick...@gmail.com
wrote:  

> https://lucene.apache.org/core/4_1_0/core/org/apache/lucene/index/IndexUpgra
der.html

>

> I'm not sure how far back this tool will work, i.e. I don't know if  
it'll successfully go from 1.4 - 5.x.  
You may have to pull a Solr 3x version to upgrade from 1.4-3x, then a  
4x version to upgrade 3x-4x  
and then finally a 5x version 4x-5x. If the IndexUpgraderTool even  
existed in 3x (that was a  
long time ago!).

>

> You can get old Solr versions here:  
<http://archive.apache.org/dist/lucene/solr/>

>

> Best,  
Erick

>

> On Wed, Jun 1, 2016 at 8:57 AM, t...@sina.com wrote:  
 Hi, Yago,  
 Could you tell me the IndexUpgrade tool exactly? It is a tool released in
the Solr binary or some command line?  
 ThanksLiu Peng  
  
 \----- 原始邮件 -  
 发件人:Yago Riveiro yago.rive...@gmail.com  
 收件人:solr-user solr-user@lucene.apache.org, solr-
u...@lucene.apache.org, t...@sina.com  
 主题:Re: Can Solr 5.5 recognize the index result generated by SolrIndexer
of old version Nutch ?  
 日期:2016年06月01日 17点58分  
  
  
  
  
 You need to upgrade your index to version 4.10 using the IndexUpgrade
tool.  
  
  
 \--  
  
 Yago Riveiro  
  
  
 On 1 Jun 2016 10:53 +0100, t...@sina.com, wrote:  
  
 Hi,  
  
 We plan to upgrade the solr server to 5.5.0. And we have a customized
crawler based on Nutch 1.2 and Solr 1.4.1.  
  
  
  
 So, the question is: can Solr 5.5 recognize the index result generated by
SolrIndexer of Nutch 1.2?  
  
 Thanks  
  
  
  
  




Re: Can Solr 5.5 recognize the index result generated by SolrIndexer of old version Nutch ?

2016-06-01 Thread Yago Riveiro
You need to upgrade your index to version 4.10 using the IndexUpgrade tool.

--
Yago Riveiro

On 1 Jun 2016 10:53 +0100, t...@sina.com, wrote:
> Hi,
> We plan to upgrade the solr server to 5.5.0. And we have a customized crawler 
> based on Nutch 1.2 and Solr 1.4.1.
> 
> So, the question is: can Solr 5.5 recognize the index result generated by 
> SolrIndexer of Nutch 1.2?
> Thanks


Re: Facet by truncated date

2016-03-31 Thread Yago Riveiro
Emir,

  

I assume that this query will create N ranges (one for each day) and give you
the counts, in this case it works indeed. I'm confess that never use facet
ranges before.

  

What output will give the range query? The result of the ranges or the dates
truncated with the counts?

  
\--

  

/Yago Riveiro

  

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/eae9e3a3308049849ef01
3655c85f3ba)

On Mar 31 2016, at 10:26 am, Emir Arnautovic
emir.arnauto...@sematext.com wrote:  

> Hi Yago,  
Not sure if I misunderstood the case, but assuming you have date field  
called my_date you can facet last 10 days by day using range queries:

>

> ?facet.range=my_datefacet.range.start=NOW/DAY-
10DAYSfacet.range.end=NOW/DAY+1DAYfacet.range.gap=+1DAY

>

> Regards,  
Emir

>

> On 31.03.2016 11:14, Yago Riveiro wrote:  
 If you want aggregate the dat by the truncated date, I think the only way
to  
 do it is using other field with the truncated date.  
  
  
  
 You can use a update request processor to calculate the truncated data  
 (https://wiki.apache.org/solr/UpdateRequestProcessor) or add the field in  
 indexing time.  
  
  
  
 date:"2016-03-31T12:00:0Z"  
  
 truncated_date_s:'2016-03-31' or truncated_date_i:20160331 (this should
be  
 more memory efficient)  
  
 \\--  
  
  
  
 /Yago Riveiro  
  
  
  

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/4708d221e9a24b519bab6  
 3936013ce59)  
  
 On Mar 31 2016, at 10:08 am, Emir Arnautovic  
 lt;emir.arnauto...@sematext.comgt; wrote:  
  
 Hi Robert,  
 You can use range faceting and set use facet.range.gap to set how dates  
 are "truncated".  
  
 Regards,  
 Emir  
  
 On 31.03.2016 10:52, Robert Brown wrote:  
 gt; Hi,  
 gt;  
 gt; Is it possible to facet by a date (solr.TrieDateField) but
truncated  
 gt; to the day, or even the hour?  
 gt;  
 gt; If not, are there any other options apart from storing that
truncated  
 gt; data in another (string?) field?  
 gt;  
 gt; Thanks,  
 gt; Rob  
 gt;  
 gt;  
  
 \\--  
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management  
 Solr amp; Elasticsearch Support * http://sematext.com/;  
  


>

> \--  
Monitoring * Alerting * Anomaly Detection * Centralized Log Management  
Solr  Elasticsearch Support * <http://sematext.com/>



Re: Facet by truncated date

2016-03-31 Thread Yago Riveiro
If you want aggregate the dat by the truncated date, I think the only way to
do it is using other field with the truncated date.

  

You can use a update request processor to calculate the truncated data
(https://wiki.apache.org/solr/UpdateRequestProcessor) or add the field in
indexing time.

  

date:"2016-03-31T12:00:0Z"

truncated_date_s:'2016-03-31' or truncated_date_i:20160331 (this should be
more memory efficient)

\--

  

/Yago Riveiro

  

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/4708d221e9a24b519bab6
3936013ce59)

On Mar 31 2016, at 10:08 am, Emir Arnautovic
emir.arnauto...@sematext.com wrote:  

> Hi Robert,  
You can use range faceting and set use facet.range.gap to set how dates  
are "truncated".

>

> Regards,  
Emir

>

> On 31.03.2016 10:52, Robert Brown wrote:  
 Hi,  
  
 Is it possible to facet by a date (solr.TrieDateField) but truncated  
 to the day, or even the hour?  
  
 If not, are there any other options apart from storing that truncated  
 data in another (string?) field?  
  
 Thanks,  
 Rob  
  


>

> \--  
Monitoring * Alerting * Anomaly Detection * Centralized Log Management  
Solr  Elasticsearch Support * <http://sematext.com/>



Re: Unable to create collection in 5.5

2016-03-28 Thread Yago Riveiro
Because I have codebase that relay in logic to resolve the name of
collections.

  

With this modification I'm forced to have logic to handled old and new
collections when this should be transparent.

  

If I have collection collection-1, collection-2, collection-3 created with a
external tool, upgrading to 5.5 now I have collection-1, collection-2,
collection-3 and collection_x.

  

A way to resolve this problem can be aliases, but the collection API doesn't
list the aliases in LIST command and read the noise CLUSTERSTATE command to
fetch collections (and aliases) in cluster with thousand of collections is
like no-no.  
  

Sorry but without a way to rename old collection to collection_*, the enforce
to do not allow hyphens is frustrating as a user.

  

\--

  

/Yago Riveiro

  

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/ec42b9ccdd9a4285b0ff0
8cf203af0f2)

On Mar 28 2016, at 6:07 pm, Anshum Gupta ans...@anshumgupta.net wrote:  

> I'm not sure why this would be a problem as older collections would  
continue to work just fine. Do you mean that the restriction doesn't allow  
you to e.g. add a shard with a valid name, to an older collection ?

>

> On Mon, Mar 28, 2016 at 9:22 AM, Yago Riveiro yago.rive...@gmail.com  
wrote:

>

>  This kind of stuff can't be released without a way to rename the
current  
 collections with hyphens (even for 6.0)  
  
  
  
 \\--  
  
  
  
 /Yago Riveiro  
  
  
  
 ![](  

https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/d6c3ba33ed5f4ac8af3b2  
 9c07e2c5e91)  
  
 On Mar 28 2016, at 5:19 pm, Anshum Gupta
lt;ans...@anshumgupta.netgt;  
 wrote:  
  
  Yes, this was added in 5.5, though I think it shouldn't have been,  
 specially the hyphens.  
 The hyphen was added back as part of SOLR-8725 but it would only be would  
 with 6.0 (and 5.5.1).  
  
   
  
   
 On Mon, Mar 28, 2016 at 7:36 AM, Yago Riveiro
lt;yago.rive...@gmail.com  
 gt;  
 wrote:  
  
   
  
  gt; Hi,  
 gt;  
 gt; With solr 5.5 I can't create a collection with the name
collection-16,  
 and  
 gt; in 5.3.1 I can do it, Why?  
 gt;  
 gt; lt;?xml version="1.0" encoding="UTF-8"?gt;  
 gt; lt;responsegt;  
 gt; lt;lst name="responseHeader"gt;lt;int  
 name="status"gt;400lt;/intgt;lt;int  
 gt;
name="QTime"gt;1lt;/intgt;lt;/lstgt;lt;lst  
 name="error"gt;lt;lst  
 name="metadata"gt;lt;str  
 gt; name="error-  
 class"gt;org.apache.solr.common.SolrExceptionlt;/strgt;
p;lt;str  
 gt;  
 gt; name="root-error-  
  
 class"gt;org.apache.solr.common.SolrExceptionlt;/strgt;
p;lt;/lstgt;lt;str  
 gt; name="msg"gt;Invalid name: 'collection-16' Identifiers must
consist  
 entirely  
 gt; of periods, underscores and
alphanumericslt;/strgt;lt;int  
 gt; name="code"gt;400lt;/intgt;lt;/lstgt;  
 gt; lt;/responsegt;  
 gt;  
 gt;  
 gt;  
 gt; \\-  
 gt; Best regards  
 gt; \\--  
 gt; View this message in context:  
 gt; http://lucene.472066.n3.nabble.com/Unable-to-create-
collection-  
 in-5-5-tp4266437.html  
 gt; Sent from the Solr - User mailing list archive at Nabble.com.  
 gt;  
  
   
  
  \\--  
 Anshum Gupta  
  


>

>  
\--  
Anshum Gupta



Re: Unable to create collection in 5.5

2016-03-28 Thread Yago Riveiro
This kind of stuff can't be released without a way to rename the current
collections with hyphens (even for 6.0)

  

\--

  

/Yago Riveiro

  

![](https://link.nylas.com/open/m7fkqw0yim04itb62itnp7r9/d6c3ba33ed5f4ac8af3b2
9c07e2c5e91)

On Mar 28 2016, at 5:19 pm, Anshum Gupta ans...@anshumgupta.net wrote:  

> Yes, this was added in 5.5, though I think it shouldn't have been,  
specially the hyphens.  
The hyphen was added back as part of SOLR-8725 but it would only be would  
with 6.0 (and 5.5.1).

>

>  
On Mon, Mar 28, 2016 at 7:36 AM, Yago Riveiro yago.rive...@gmail.com  
wrote:

>

>  Hi,  
  
 With solr 5.5 I can't create a collection with the name collection-16,
and  
 in 5.3.1 I can do it, Why?  
  
 ?xml version="1.0" encoding="UTF-8"?  
 response  
 lst name="responseHeader"int
name="status"400/intint  
 name="QTime"1/int/lstlst name="error"lst
name="metadata"str  
 name="error-
class"org.apache.solr.common.SolrException/strstr  
  
 name="root-error-
class"org.apache.solr.common.SolrException/str/lststr  
 name="msg"Invalid name: 'collection-16' Identifiers must consist
entirely  
 of periods, underscores and alphanumerics/strint  
 name="code"400/int/lst  
 /response  
  
  
  
 \-  
 Best regards  
 \--  
 View this message in context:  
 <http://lucene.472066.n3.nabble.com/Unable-to-create-collection-
in-5-5-tp4266437.html>  
 Sent from the Solr - User mailing list archive at Nabble.com.  


>

> \--  
Anshum Gupta



Unable to create collection in 5.5

2016-03-28 Thread Yago Riveiro
Hi,

With solr 5.5 I can't create a collection with the name collection-16, and
in 5.3.1 I can do it, Why?



4001org.apache.solr.common.SolrExceptionorg.apache.solr.common.SolrExceptionInvalid name: 'collection-16' Identifiers must consist entirely
of periods, underscores and alphanumerics400




-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-create-collection-in-5-5-tp4266437.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Yago Riveiro
I did the IndexUpgrade path to upgrade my 4.x index to 5.x (15 terabytes of
data an growing), It wasn't an easy task to do it without downtime,
IndexUpgrade doesn't work if the replica is loaded.  
  
With 12T of data re-index is like a no-no operation (the time expended to do
the re-index can take several months).

  

Optimize one replica at a time doesn't work (All replicas are optimize at the
same time) killing CPU an IO and as result the cluster.

  

Conclusion, if I need to do it again to upgrade to a newer version of Solr I'm
in literally in troubles ...  
  

\--

  

/Yago Riveiro

> On Mar 24 2016, at 4:32 pm, Tomás Fernández Löbbe
tomasflo...@gmail.com wrote:  

>

>   
  
 Not to mention the fact that Solr 6 is using deprecated Lucene 6  
 numeric types if those are removed in Lucene 7, then what?  
  
 I believe this is going to be an issue. We have SOLR-8396  
https://issues.apache.org/jira/browse/SOLR-8396; open, but it doesn't
look  
like it's going to make it to 6.0 (I tried to look at it but I didn't have  
time the past weeks). We'll have to support it until Solr 8 I guess.

>

> Tomás



Re: java.lang.NullPointerException in json facet hll function

2016-03-22 Thread Yago Riveiro
Nop.  
  
A normal query with wt=json  
  
the q parameter is *:*

  

The unique particular thing with this index is that some docs has the field
visitor__visitor_id as dynamic type long and others has the field as type
string. (our indexer tool didn't resolve the type right as result of a bug,
that was resolved later)  
  
In fact if I add q=visitor__visitor_id_l:* to query I have no error.

  

I think the problem is that I have the field "visitor__visitor_id" with _s and
_l mixed in the index. But this should not be a problem because they are two
independent fields, isn't it?

  

\--

  

/Yago Riveiro

> On Mar 22 2016, at 5:00 pm, Yonik Seeley ysee...@gmail.com wrote:  

>

> Hmmm, looks like the "hll" value is missing for some reason. It's not  
clear why that would happen... are you running any custom code?

>

> -Yonik

>

> On Tue, Mar 22, 2016 at 12:54 PM, Yago Riveiro
yago.rive...@gmail.com wrote:  
 Solr version: 5.3.1  
  
 With this query:  
  
 group:  
 {  
 type:terms,  
 limit:-1,  
 field:group,  
 sort:{index:asc},  
 numBuckets:true,  
 facet:{  
 col_1_unique_visitors:'hll(visitor__visitor_id_l)'  
 }  
 }  
 }  
  
 visitor__visitor_id_l is a dynamic field.  
  
 Running the query described above I'm hitting this exception.  
  
 java.lang.NullPointerException at  
 org.apache.solr.search.facet.HLLAgg$Merger.merge(HLLAgg.java:86) at  

org.apache.solr.search.facet.FacetBucket.mergeBucket(FacetModule.java:410)  
 at  
 org.apache.solr.search.facet.FacetFieldMerger.mergeBucketList(FacetModule
.java:510)  
 at
org.apache.solr.search.facet.FacetFieldMerger.merge(FacetModule.java:488)  
 at
org.apache.solr.search.facet.FacetFieldMerger.merge(FacetModule.java:462)  
 at  

org.apache.solr.search.facet.FacetBucket.mergeBucket(FacetModule.java:410)  
 at
org.apache.solr.search.facet.FacetQueryMerger.merge(FacetModule.java:337)  
 at  

org.apache.solr.search.facet.FacetModule.handleResponses(FacetModule.java:178)  
 at  
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchH
andler.java:410)  
 at  
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
se.java:143)  
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at  
 org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) at  
 org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) at  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
va:214)  
 at  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
va:179)  
 at  
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHand
ler.java:1652)  
 at  

org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)  
 at  

org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)  
 at  

org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)  
 at  
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.j
ava:223)  
 at  
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.j
ava:1127)  
 at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)  
 at  
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.ja
va:185)  
 at  
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.ja
va:1061)  
 at  

org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)  
 at  
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextH
andlerCollection.java:215)  
 at  
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollecti
on.java:110)  
 at  

org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)  
 at org.eclipse.jetty.server.Server.handle(Server.java:499) at  
 org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at  

org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)  
 at  

org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)  
 at  
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.ja
va:635)  
 at  
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.jav
a:555)  
 at java.lang.Thread.run(Thread.java:745)  
  
  
  
 \-  
 Best regards  
 \--  
 View this message in context: <http://lucene.472066.n3.nabble.com/java-
lang-NullPointerException-in-json-facet-hll-function-tp4265378.html>  
 Sent from the Solr - User mailing list archive at Nabble.com.



java.lang.NullPointerException in json facet hll function

2016-03-22 Thread Yago Riveiro
Solr version: 5.3.1

With this query:

group:
{
type:terms,
limit:-1,
field:group,
sort:{index:asc},
numBuckets:true,
facet:{
col_1_unique_visitors:'hll(visitor__visitor_id_l)'
}
}
}

visitor__visitor_id_l is a dynamic field.

Running the query described above I'm hitting this exception.

java.lang.NullPointerException at
org.apache.solr.search.facet.HLLAgg$Merger.merge(HLLAgg.java:86) at
org.apache.solr.search.facet.FacetBucket.mergeBucket(FacetModule.java:410)
at
org.apache.solr.search.facet.FacetFieldMerger.mergeBucketList(FacetModule.java:510)
at org.apache.solr.search.facet.FacetFieldMerger.merge(FacetModule.java:488)
at org.apache.solr.search.facet.FacetFieldMerger.merge(FacetModule.java:462)
at
org.apache.solr.search.facet.FacetBucket.mergeBucket(FacetModule.java:410)
at org.apache.solr.search.facet.FacetQueryMerger.merge(FacetModule.java:337)
at
org.apache.solr.search.facet.FacetModule.handleResponses(FacetModule.java:178)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:410)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) at
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499) at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745) 



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-NullPointerException-in-json-facet-hll-function-tp4265378.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JSON facets, count a long or an integer in cloud and non-cloud modes

2016-03-22 Thread Yago Riveiro
I have a felling that this is related with the number of nodes of the cluster.  
  
My dev runs in  cloud mode but only has one node, production has 12, and the
version is the same.  

\--

  

/Yago Riveiro

> On Mar 22 2016, at 9:13 am, Markus Jelsma markus.jel...@openindex.io
wrote:  

>

> I'm now using instanceof as ugly work around but i'd prefer a decent
solution.  
M

>

>  
  
\-Original message-  
 From:Yago Riveiro yago.rive...@gmail.com  
 Sent: Tuesday 22nd March 2016 9:52  
 To: solr-user solr-user@lucene.apache.org; solr-
u...@lucene.apache.org  
 Subject: Re: JSON facets, count a long or an integer in cloud and non-
cloud modes  
  
 I have the same problem with a custom response writer.  
  
 In production works but in my dev doesn't and are the same version 5.3.1  
  
 \--  
 Yago Riveiro  
  
 On 22 Mar 2016 08:47 +, Markus
Jelsmamarkus.jel...@openindex.io, wrote:  
  Hello,  
   
  Using SolrJ i built a method that consumes output produced by JSON
facets, it also checks the count before further processing the output:  
   
  result name="response" numFound="49" start="0"  
  /result  
  lst name="facets"  
  int name="count"49/int  
  lst name="by_day"  
  arr name="buckets"  
  lst  
   
  This is the code reading the count value via SolrJ:  
   
  QueryResponse response = sourceClient.query(query);  
  NamedList jsonFacets =
(NamedList)response.getResponse().get("facets");  
  int totalOccurences = (int)jsonFacets.get("count");  
   
  The problem is, this code doesn't work in unit tests, it throws a:  
  java.lang.ClassCastException: java.lang.Long cannot be cast to
java.lang.Integer!?  
   
  But why it is an integer right? Anyway, i change the totalOccurences
and the cast to a long and the unit tests runs just fine. But when actually
running the code, i suddenly get another cast exception at exactly the same
line.  
  java.lang.ClassCastException: java.lang.Integer cannot be cast to
java.lang.Long  
   
  What is going on? The only difference is that the unit tests runs in
cloud mode via AbstractFullDistribZkTestBase, but i run the code in a local
dev non-cloud mode. I haven't noticed this behaviour anywhere else although i
have many unit tests consuming lots of different pieces of Solr output, and
all that code runs fine in non-cloud mode too.  
   
  Is this to be expected, normal? Did i catch another bug?  
   
  Thanks!  
  Markus  




Re: JSON facets, count a long or an integer in cloud and non-cloud modes

2016-03-22 Thread Yago Riveiro
I have the same problem with a custom response writer.

In production works but in my dev doesn't and are the same version 5.3.1

--
Yago Riveiro

On 22 Mar 2016 08:47 +, Markus Jelsma<markus.jel...@openindex.io>, wrote:
> Hello,
> 
> Using SolrJ i built a method that consumes output produced by JSON facets, it 
> also checks the count before further processing the output:
> 
>49
> This is the code reading the count value via SolrJ:
> 
> QueryResponse response = sourceClient.query(query);
> NamedList jsonFacets = (NamedList)response.getResponse().get("facets");
> int totalOccurences = (int)jsonFacets.get("count");
> 
> The problem is, this code doesn't work in unit tests, it throws a:
> java.lang.ClassCastException: java.lang.Long cannot be cast to 
> java.lang.Integer!?
> 
> But why it is an integer right? Anyway, i change the totalOccurences and the 
> cast to a long and the unit tests runs just fine. But when actually running 
> the code, i suddenly get another cast exception at exactly the same line.
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
> 
> What is going on? The only difference is that the unit tests runs in cloud 
> mode via AbstractFullDistribZkTestBase, but i run the code in a local dev 
> non-cloud mode. I haven't noticed this behaviour anywhere else although i 
> have many unit tests consuming lots of different pieces of Solr output, and 
> all that code runs fine in non-cloud mode too.
> 
> Is this to be expected, normal? Did i catch another bug?
> 
> Thanks!
> Markus


IllegalArgumentException: Seeking to negative position

2016-03-08 Thread Yago Riveiro
I saw this exception in my log. What can caused this?

java.lang.IllegalArgumentException: Seeking to negative position:
MMapIndexInput(path="/opt/solr/node/collections/2016_shard9_replica2/data/index/_0.fdx")
at
org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.seek(ByteBufferIndexInput.java:407)
at 
org.apache.lucene.codecs.CodecUtil.retrieveChecksum(CodecUtil.java:400)
at 
org.apache.solr.handler.IndexFetcher.compareFile(IndexFetcher.java:843)
at 
org.apache.solr.handler.IndexFetcher.isIndexStale(IndexFetcher.java:914)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:376)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:254)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:380)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:162)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:244)
at
org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.seek(ByteBufferIndexInput.java:404)
... 9 more



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/IllegalArgumentException-Seeking-to-negative-position-tp4262463.html
Sent from the Solr - User mailing list archive at Nabble.com.


How can I monitor the jetty thread pool

2016-03-07 Thread Yago Riveiro
Hi,

How can I monitor the jetty thread pool?

I want to do a zabbix graph with this info but the JMX doesn't show any
entry for this.



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-I-monitor-the-jetty-thread-pool-tp4262298.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Bulk delete of Solr documents

2016-02-08 Thread Yago Riveiro
Yes.  
  
You can delete using a query

  

http://blog.dileno.com/archive/201106/delete-documents-from-solr-index-by-
query/  

  

\--

/Yago Riveiro

> On Feb 8 2016, at 4:35 pm, Anil anilk...@gmail.com wrote:  

>

> Hi ,

>

> Can we delete solr documents from a collection in a bulk ?

>

> Regards,  
Anil



Solr Replication error

2016-01-24 Thread Yago Riveiro
I cached this in my logs. Any reason to this happen?

My Solr version is 5.3.1.

Index fetch failed :org.apache.solr.common.SolrException: Index fetch failed
: 
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:515)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:254)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:380)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:162)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
Caused by: java.nio.file.NoSuchFileException:
/opt/solrcloud-node/collections/collection-2016_shard9_replica2/data/index.20160105005921682/segments_fhj
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:236)
at 
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109)
at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:294)
at
org.apache.solr.handler.IndexFetcher.hasUnusedFiles(IndexFetcher.java:568)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:397)
... 5 more




-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Replication-error-tp4252929.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Scaling SolrCloud

2016-01-21 Thread Yago Riveiro
Is not a typo. I was wrong, for zookeeper 2 nodes still count as majority.
It's not the desirable configuration but is tolerable.

  

Thanks Erick.

  

\--

/Yago Riveiro

> On Jan 21 2016, at 4:15 am, Erick Erickson erickerick...@gmail.com
wrote:  

>

> bq: 3 are to risky, you lost one you lost quorum

>

> Typo? You need to lose two.

>

> On Wed, Jan 20, 2016 at 6:25 AM, Yago Riveiro yago.rive...@gmail.com
wrote:  
 Our Zookeeper cluster is an ensemble of 5 machines, is a good starting
point,  
 3 are to risky, you lost one you lost quorum and with 7 sync cost
increase.  
  
  
  
 ZK cluster is in machines without IO and rotative hdd (don't not use SDD
to  
 gain IO performance, zookeeper is optimized to spinning disks).  
  
  
  
 The ZK cluster behaves without problems, the first deploy of ZK was in
the  
 same machines that the Solr Cluster (ZK log in its own hdd) and that
didn't  
 wok very well, CPU and networking IO from Solr Cluster was too much.  
  
  
  
 About schema modifications.  
  
 Modify the schema to add new fields is relative simple with new API, in
the  
 pass all the work was manually uploading the schema to ZK and reloading
all  
 collections (indexing must be disable or timeouts and funny errors
happen).  
  
 With the new Schema API this is more user friendly. Anyway, I stop
indexing  
 and for reload the collections (I don't know if it's necessary nowadays).  
  
 About Indexing data.  
  
  
  
 We have self made data importer, it's not java and not performs batch
indexing  
 (with 500 collections buffer data and build the batch is expensive and  
 complicate for error handling).  
  
  
  
 We use regular HTTP post in json. Our throughput is about 1000 docs/s
without  
 any type of optimization. Some time we have issues with replication, the
slave  
 can keep pace with leader insertion and a full sync is requested, this is
bad  
 because sync the replica again implicates a lot of IO wait and CPU and
with  
 replicas with 100G take an hour or more (normally when this happen, we
disable  
 indexing to release IO and CPU and not kill the node with a load of 50 or
60).  
  
 In this department my advice is "keep it simple" in the end is an HTTP
POST to  
 a node of the cluster.  
  
  
  
 \\--  
  
 /Yago Riveiro  
  
 On Jan 20 2016, at 1:39 pm, Troy Edwards
lt;tedwards415...@gmail.comgt;  
 wrote:  
  
  
  
 Thank you for sharing your experiences/ideas.  
  
  
  
 Yago since you have 8 billion documents over 500 collections, can you
share  
 what/how you do index maintenance (e.g. add field)? And how are you
loading  
 data into the index? Any experiences around how Zookeeper ensemble
behaves  
 with so many collections?  
  
  
  
 Best,  
  
  
  
  
 On Tue, Jan 19, 2016 at 6:05 PM, Yago Riveiro
lt;yago.rive...@gmail.comgt;  
 wrote:  
  
  
  
 gt; What I can say is:  
 gt;  
 gt;  
 gt; * SDD (crucial for performance if the index doesn't fit in
memory, and  
 gt; will not fit)  
 gt; * Divide and conquer, for that volume of docs you will need more
than 6  
 gt; nodes.  
 gt; * DocValues to not stress the java HEAP.  
 gt; * Do you will you aggregate data?, if yes, what is your max  
 gt; cardinality?, this question is the most important to size
correctly the  
 gt; memory needs.  
 gt; * Latency is important too, which threshold is acceptable before  
 gt; consider a query slow?  
 gt; At my company we are running a 12 terabytes (2 replicas) Solr
cluster  
 with  
 gt; 8  
 gt; billion documents sparse over 500 collection . For this we have
about 12  
 gt; machines with SDDs and 32G of ram each (~24G for the heap).  
 gt;  
 gt; We don't have a strict need of speed, 30 second query to
aggregate 100  
 gt; million  
 gt; documents with 1M of unique keys is fast enough for us, normally
the  
 gt; aggregation performance decrease as the number of unique keys
increase,  
 gt; with  
 gt; low unique key factor, queries take less than 2 seconds if data
is in OS  
 gt; cache.  
 gt;  
 gt; Personal recommendations:  
 gt;  
 gt; * Sharding is important and smart sharding is crucial, you don't
want  
 gt; run queries on data that is not interesting (this slow down
queries when  
 gt; the dataset is big).  
 gt; * If you want measure speed do it with about 1 billion documents
to  
 gt; simulate something real (real for 10 billion document world).  
 gt; * Index with re-indexing in mind. with 10 billion docs, re-index
data  
 gt; takes months ... This is important if you don't use regular
features of  
 gt; Solr. In my case I configured Docvalues with disk format (not
standard  
 gt; feature in 4.x) and at some point this format was deprecated.
Upgrade  
 Solr  
 gt; to 5.x was an epic 3 months battle to do it without full
downtime.  
 gt; * Solr is like your girlfriend, will demand love and care and
plenty of  
 gt; space to full-recover replicas that in some point are out of
sync, happen  
 a  
 gt; lot restarting nodes (this is a

Re: Scaling SolrCloud

2016-01-20 Thread Yago Riveiro
Our Zookeeper cluster is an ensemble of 5 machines, is a good starting point,
3 are to risky, you lost one you lost quorum and with 7 sync cost increase.

  

ZK cluster is in machines without IO and rotative hdd (don't not use SDD to
gain IO performance,  zookeeper is optimized to spinning disks).

  

The ZK cluster behaves without problems, the first deploy of ZK was in the
same machines that the Solr Cluster (ZK log in its own hdd) and that didn't
wok very well, CPU and networking IO from Solr Cluster was too much.

  

About schema modifications.  
  
Modify the schema to add new fields is relative simple with new API, in the
pass all the work was manually uploading the schema to ZK and reloading all
collections (indexing must be disable or timeouts and funny errors happen).  
  
With the new Schema API this is more user friendly. Anyway, I stop indexing
and for reload the collections (I don't know if it's necessary nowadays).  
  
About Indexing data.

  

We have self made data importer, it's not java and not performs batch indexing
(with 500 collections buffer data and build the batch is expensive and
complicate for error handling).

  

We use regular HTTP post in json. Our throughput  is about 1000 docs/s without
any type of optimization. Some time we have issues with replication, the slave
can keep pace with leader insertion and a full sync is requested, this is bad
because sync the replica again implicates a lot of IO wait and CPU and with
replicas with 100G take an hour or more (normally when this happen, we disable
indexing to release IO and CPU and not kill the node with a load of 50 or 60).  
  
In this department my advice is "keep it simple" in the end is an HTTP POST to
a node of the cluster.

  

\--

/Yago Riveiro

> On Jan 20 2016, at 1:39 pm, Troy Edwards tedwards415...@gmail.com
wrote:  

>

> Thank you for sharing your experiences/ideas.

>

> Yago since you have 8 billion documents over 500 collections, can you share  
what/how you do index maintenance (e.g. add field)? And how are you loading  
data into the index? Any experiences around how Zookeeper ensemble behaves  
with so many collections?

>

> Best,

>

>  
On Tue, Jan 19, 2016 at 6:05 PM, Yago Riveiro yago.rive...@gmail.com  
wrote:

>

>  What I can say is:  
  
  
 * SDD (crucial for performance if the index doesn't fit in memory, and  
 will not fit)  
 * Divide and conquer, for that volume of docs you will need more than 6  
 nodes.  
 * DocValues to not stress the java HEAP.  
 * Do you will you aggregate data?, if yes, what is your max  
 cardinality?, this question is the most important to size correctly the  
 memory needs.  
 * Latency is important too, which threshold is acceptable before  
 consider a query slow?  
 At my company we are running a 12 terabytes (2 replicas) Solr cluster
with  
 8  
 billion documents sparse over 500 collection . For this we have about 12  
 machines with SDDs and 32G of ram each (~24G for the heap).  
  
 We don't have a strict need of speed, 30 second query to aggregate 100  
 million  
 documents with 1M of unique keys is fast enough for us, normally the  
 aggregation performance decrease as the number of unique keys increase,  
 with  
 low unique key factor, queries take less than 2 seconds if data is in OS  
 cache.  
  
 Personal recommendations:  
  
 * Sharding is important and smart sharding is crucial, you don't want  
 run queries on data that is not interesting (this slow down queries when  
 the dataset is big).  
 * If you want measure speed do it with about 1 billion documents to  
 simulate something real (real for 10 billion document world).  
 * Index with re-indexing in mind. with 10 billion docs, re-index data  
 takes months ... This is important if you don't use regular features of  
 Solr. In my case I configured Docvalues with disk format (not standard  
 feature in 4.x) and at some point this format was deprecated. Upgrade
Solr  
 to 5.x was an epic 3 months battle to do it without full downtime.  
 * Solr is like your girlfriend, will demand love and care and plenty of  
 space to full-recover replicas that in some point are out of sync, happen
a  
 lot restarting nodes (this is annoying with replicas with 100G), don't  
 underestimate this point. Free space can save your life.  
  
 \\--  
  
 /Yago Riveiro  
  
  On Jan 19 2016, at 11:26 pm, Shawn Heisey
lt;apa...@elyograg.orggt;  
 wrote:  
  
   
  
  On 1/19/2016 1:30 PM, Troy Edwards wrote:  
 gt; We are currently "beta testing" a SolrCloud with 2 nodes and 2
shards  
 with  
 gt; 2 replicas each. The number of documents is about 125000.  
 gt;  
 gt; We now want to scale this to about 10 billion documents.  
 gt;  
 gt; What are the steps to prototyping, hardware estimation and
stress  
 testing?  
  
   
  
  There is no general information available for sizing, because there
are  
 too many factors that will affect the answer

Re: Scaling SolrCloud

2016-01-19 Thread Yago Riveiro
What I can say is:  
  

  * SDD (crucial for performance if the index doesn't fit in memory, and will 
not fit)
  * Divide and conquer, for that volume of docs you will need more than 6 nodes.
  * DocValues to not stress the java HEAP.
  * Do you will you aggregate data?, if yes, what is your max cardinality?, 
this question is the most important to size correctly the memory needs.
  * Latency is important too, which threshold is acceptable before consider a 
query slow?
At my company we are running a 12 terabytes (2 replicas) Solr cluster with 8
billion documents sparse over 500 collection . For this we have about 12
machines with SDDs and 32G of ram each (~24G for the heap).  
  
We don't have a strict need of speed, 30 second query to aggregate 100 million
documents with 1M of unique keys is fast enough for us, normally the
aggregation performance decrease as the number of unique keys increase, with
low unique key factor, queries take less than 2 seconds if data is in OS
cache.  
  
Personal recommendations:

  * Sharding is important and smart sharding is crucial, you don't want run 
queries on data that is not interesting (this slow down queries when the 
dataset is big). 
  * If you want measure speed do it with about 1 billion documents to simulate 
something real (real for 10 billion document world).
  * Index with re-indexing in mind. with 10 billion docs, re-index data takes 
months ... This is important if you don't use regular features of Solr. In my 
case I configured Docvalues with disk format (not standard feature in 4.x) and 
at some point this format was deprecated. Upgrade Solr to 5.x was an epic 3 
months battle to do it without full downtime.
  * Solr is like your girlfriend, will demand love and care and plenty of space 
to full-recover replicas that in some point are out of sync, happen a lot 
restarting nodes (this is annoying with replicas with 100G), don't 
underestimate this point. Free space can save your life.  

\--

/Yago Riveiro

> On Jan 19 2016, at 11:26 pm, Shawn Heisey apa...@elyograg.org wrote:  

>

> On 1/19/2016 1:30 PM, Troy Edwards wrote:  
 We are currently "beta testing" a SolrCloud with 2 nodes and 2 shards
with  
 2 replicas each. The number of documents is about 125000.  
  
 We now want to scale this to about 10 billion documents.  
  
 What are the steps to prototyping, hardware estimation and stress
testing?

>

> There is no general information available for sizing, because there are  
too many factors that will affect the answers. Some of the important  
information that you need will be impossible to predict until you  
actually build it and subject it to a real query load.

>

> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-
have-a-definitive-answer/

>

> With an index of 10 billion documents, you may not be able to precisely  
predict performance and hardware requirements from a small-scale  
prototype. You'll likely need to build a full-scale system on a small  
testbed, look for bottlenecks, ask for advice, and plan on a larger  
system for production.

>

> The hard limit for documents on a single shard is slightly less than  
Java's Integer.MAX_VALUE -- just over two billion. Because deleted  
documents count against this max, about one billion documents per shard  
is the absolute max that should be loaded in practice.

>

> BUT, if you actually try to put one billion documents in a single  
server, performance will likely be awful. A more reasonable limit per  
machine is 100 million ... but even this is quite large. You might need  
smaller shards, or you might be able to get good performance with larger  
shards. It all depends on things that you may not even know yet.

>

> Memory is always a strong driver for Solr performance, and I am speaking  
specifically of OS disk cache -- memory that has not been allocated by  
any program. With 10 billion documents, your total index size will  
likely be hundreds of gigabytes, and might even reach terabyte scale.  
Good performance with indexes this large will require a lot of total  
memory, which probably means that you will need a lot of servers and  
many shards. SSD storage is strongly recommended.

>

> For extreme scaling on Solr, especially if the query rate will be high,  
it is recommended to only have one shard replica per server.

>

> I have just added an "extreme scaling" section to the following wiki  
page, but it's mostly a placeholder right now. I would like to have a  
discussion with people who operate very large indexes so I can put real  
usable information in this section. I'm on IRC quite frequently in the  
#solr channel.

>

> https://wiki.apache.org/solr/SolrPerformanceProblems

>

> Thanks,  
Shawn



Json facet query error "null:java.lang.IllegalArgumentException"

2015-12-22 Thread Yago Riveiro
Hi,

I'm hitting an error when a try to run a json facet query in a node that
doesn't have any shard that belongs to collection. The same query using
using the legacy facet method works.

http://devel-16:8983/solr/collection-perf/query?rows=0=*:*={label:{type:terms,field:url,limit:-1,sort:{index:asc},numBuckets:true}}

Error:
HTTP ERROR 500

Problem accessing /solr/collection-perf/select. Reason:

{msg=Illegal character in query at index 102:
http://devel-15:8983/solr/collection-perf/select?rows=0=date:20150101={label:{type:terms,field:url_encoded,limit:-1,sort:{index:asc}}}=json,trace=java.lang.IllegalArgumentException:
Illegal character in query at index 102:
http://devel-15:8983/solr/collection-perf/select?rows=0=date:20150101={label:{type:terms,field:url_encoded,limit:-1,sort:{index:asc}}}=json
at java.net.URI.create(URI.java:859)
at org.apache.http.client.methods.HttpGet.(HttpGet.java:69)
at 
org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:535)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:446)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.URISyntaxException: Illegal character in query at index
102:
http://devel-15:8983/solr/collection-perf/select?rows=0=date:20150101={label:{type:terms,field:url_encoded,limit:-1,sort:{index:asc}}}=json
at java.net.URI$Parser.fail(URI.java:2829)
at java.net.URI$Parser.checkChars(URI.java:3002)
at java.net.URI$Parser.parseHierarchical(URI.java:3092)
at java.net.URI$Parser.parse(URI.java:3034)
at java.net.URI.(URI.java:595)
at java.net.URI.create(URI.java:857)
... 25 more
,code=500}
Powered by Jetty://



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Json-facet-query-error-null-java-lang-IllegalArgumentException-tp4246523.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Json facet query error "null:java.lang.IllegalArgumentException"

2015-12-22 Thread Yago Riveiro
I’m in 5.3.1.




I’m waiting some time to upgrade to 5.4 to see if some nasty bug is reported. 
But after hitting this issue I think that I should upgrade ...


—/Yago Riveiro

On Tue, Dec 22, 2015 at 3:17 PM, Yonik Seeley <ysee...@gmail.com> wrote:

> OK found the issue:
>  https://issues.apache.org/jira/browse/SOLR-5971
> Fixed in 5.4
> -Yonik
> On Tue, Dec 22, 2015 at 10:15 AM, Yonik Seeley <ysee...@gmail.com> wrote:
>> This was a generic query-forwarding bug in Solr, that was recently fixed.
>> Not sure the JIRA now... what version are you using?
>> -Yonik
>>
>>
>> On Tue, Dec 22, 2015 at 10:11 AM, Yago Riveiro <yago.rive...@gmail.com> 
>> wrote:
>>> Hi,
>>>
>>> I'm hitting an error when a try to run a json facet query in a node that
>>> doesn't have any shard that belongs to collection. The same query using
>>> using the legacy facet method works.
>>>
>>> http://devel-16:8983/solr/collection-perf/query?rows=0=*:*={label:{type:terms,field:url,limit:-1,sort:{index:asc},numBuckets:true}}
>>>
>>> Error:
>>> HTTP ERROR 500
>>>
>>> Problem accessing /solr/collection-perf/select. Reason:
>>>
>>> {msg=Illegal character in query at index 102:
>>> http://devel-15:8983/solr/collection-perf/select?rows=0=date:20150101={label:{type:terms,field:url_encoded,limit:-1,sort:{index:asc}}}=json,trace=java.lang.IllegalArgumentException:
>>> Illegal character in query at index 102:
>>> http://devel-15:8983/solr/collection-perf/select?rows=0=date:20150101={label:{type:terms,field:url_encoded,limit:-1,sort:{index:asc}}}=json
>>> at java.net.URI.create(URI.java:859)
>>> at org.apache.http.client.methods.HttpGet.(HttpGet.java:69)
>>> at 
>>> org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:535)
>>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:446)
>>> at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
>>> at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>>> at
>>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>>> at
>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>>> at
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>>> at
>>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>>> at
>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>>> at
>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>>> at
>>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>>> at
>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>>> at
>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>>> at
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>>> at
>>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>>> at
>>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>>> at
>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>>> at org.eclipse.jetty.server.Server.handle(Server.java:499)
>>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>>> at
>>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>>> at
>>> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>>> at
>>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>>> at
>>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>>> at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.net.URISyntaxException: Illegal character in query at index
>>> 102:
>>> http://devel-15:8983/solr/collection-perf/select?rows=0=date:20150101={label:{type:terms,field:url_encoded,limit:-1,sort:{index:asc}}}=json
>>> at java.net.URI$Parser.fail(URI.java:2829)
>>> at java.net.URI$Parser.checkChars(URI.java:3002)
>>> at java.net.URI$Parser.parseHierarchical(URI.java:3092)
>>> at java.net.URI$Parser.parse(URI.java:3034)
>>> at java.net.URI.(URI.java:595)
>>> at java.net.URI.create(URI.java:857)
>>> ... 25 more
>>> ,code=500}
>>> Powered by Jetty://
>>>
>>>
>>>
>>> -
>>> Best regards
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/Json-facet-query-error-null-java-lang-IllegalArgumentException-tp4246523.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Json facet api method stream

2015-12-22 Thread Yago Riveiro
The collection is a 12 shards distributed to 12 physical nodes (24G heap each, 
32G RAM) (no replication). all cache are disable in solrconfig.xml, The rate of 
indexing is about 2000 docs/s, this transform cache useless 




At the time of the perf test the amount of docs were 34M (now is 54 but the set 
will grow to 600 millions more or less) with 7M (and growing) unique keys. I’m 
indexing docs with an url and an user_id.





{
name: “url_encoded",



type: "string",



docValues: true,



indexed: true,



stored: true



},






{
name: “user_id",



type: "tlong",



docValues: true,



multiValued: false,



indexed: true,



stored: true



},





The query is simple, aggregate by url with a subfacet to each url to calculate 
the estimate unique users




I’m using Solr 5.3.1.




- Normal query (I guess uses under the hood the DVs): 
json.facet={url:{type:terms,field:url,limit:-1,sort:{index:asc},facet:{users:’hll(user_id)'}}}

- Streaming query:  
json.facet={url:{type:terms,field:url,limit:-1,sort:{index:asc},facet:{users:’hll(user_id)’},
 method:stream}}




This is a perf test to see if sorl has the capacity to aggregate the 600M url 
with the unique users and the average response time (minutes is acceptable, but 
less as possible is desirable)


—/Yago Riveiro

On Tue, Dec 22, 2015 at 3:27 PM, Yonik Seeley <ysee...@gmail.com> wrote:

> On Tue, Dec 22, 2015 at 6:06 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:
>> I’m surprised with the difference of speed between DV and stream, the same 
>> query (aggregate 7M unique keys) with stream method takes 21s and with DV is 
>> about 3 minutes ...
> Wow - is this a "real" DV field, or one that was built on-demand in
> the FieldCache?  Were those times for the first request, or subsequent
> requests?
> What are the characteristics of that field... i.e. how many unique
> values in the shard (local index being queried) and how many typical
> values per field?
> And how many docs total on the shard?
> -Yonik

Re: Json facet api method stream

2015-12-22 Thread Yago Riveiro
Ok,




I’m surprised with the difference of speed between DV and stream, the same 
query (aggregate 7M unique keys) with stream method takes 21s and with DV is 
about 3 minutes ... 


—/Yago Riveiro

On Tue, Dec 22, 2015 at 1:46 AM, Yonik Seeley <ysee...@gmail.com> wrote:

> On Mon, Dec 21, 2015 at 6:56 PM, Yago Riveiro <yago.rive...@gmail.com> wrote:
>> The json facet API method "stream" uses the docvalues internally for do the
>> aggregation on the fly?
>>
>> I wan't to know if using this method justifies have the docvalues configured
>> in schema.
> It won't use docValues for the actual field being faceted on (because
> streaming in term order means that it's most efficient to use the term
> index and not docValues to find all of the docs that match a given
> term).
> It will use docValues for sub-facets/stats.
> -Yonik

Indexing using a collection alias

2015-12-22 Thread Yago Riveiro
Hi,

It's possible index documents using the alias and not the collection name,
if the alias only point to one collection?

The Solr collection API doesn't allow rename a collection, so I wan't to
know if with aliases I can achieve this functionality.

All documentation that I googled use the alias for read operations ...



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-using-a-collection-alias-tp4246521.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Json facet api method stream

2015-12-22 Thread Yago Riveiro
Here a live example




[yago@dev-1 ~]$ time curl -g 
"http://dev-1:8983/solr/collection-perf/query?rows=0=date:[20150101%20TO%2020150115]={label:{type:terms,field:url_encoded,limit:-1,sort:{index:asc},facet:{user:'hll(user_id)'}}}"
 > dump




  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100 90.7M    0 90.7M    0     0  1039k      0 --:--:--  0:01:29 --:--:-- 21.2M




real1m29.387s

user0m0.065s

sys 0m0.338s




[yago@dev-1 ~]$ time curl -g 
"http://dev-1/solr/collection-perf/query?rows=0=date:[20150101%20TO%2020150115]={label:{type:terms,field:url_encoded,limit:-1,sort:{index:asc},method:stream,facet:{user:'hll(user_id)'}}}"
 > dump-stream




  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100 90.7M    0 90.7M    0     0  9276k      0 --:--:--  0:00:10 --:--:-- 22.6M




real0m10.026s

user0m0.038s

sys 0m0.245s





[yago@dev-1 ~]$ diff dump dump-stream

[yago@dev-1 ~]$




—/Yago Riveiro

On Tue, Dec 22, 2015 at 3:57 PM, Yago Riveiro <yago.rive...@gmail.com>
wrote:

> The collection is a 12 shards distributed to 12 physical nodes (24G heap 
> each, 32G RAM) (no replication). all cache are disable in solrconfig.xml, The 
> rate of indexing is about 2000 docs/s, this transform cache useless 
> At the time of the perf test the amount of docs were 34M (now is 54 but the 
> set will grow to 600 millions more or less) with 7M (and growing) unique 
> keys. I’m indexing docs with an url and an user_id.
> {
> name: “url_encoded",
> type: "string",
> docValues: true,
> indexed: true,
> stored: true
> },
> {
> name: “user_id",
> type: "tlong",
> docValues: true,
> multiValued: false,
> indexed: true,
> stored: true
> },
> The query is simple, aggregate by url with a subfacet to each url to 
> calculate the estimate unique users
> I’m using Solr 5.3.1.
> - Normal query (I guess uses under the hood the DVs): 
> json.facet={url:{type:terms,field:url,limit:-1,sort:{index:asc},facet:{users:’hll(user_id)'}}}
> - Streaming query:  
> json.facet={url:{type:terms,field:url,limit:-1,sort:{index:asc},facet:{users:’hll(user_id)’},
>  method:stream}}
> This is a perf test to see if sorl has the capacity to aggregate the 600M url 
> with the unique users and the average response time (minutes is acceptable, 
> but less as possible is desirable)
> —/Yago Riveiro
> On Tue, Dec 22, 2015 at 3:27 PM, Yonik Seeley <ysee...@gmail.com> wrote:
>> On Tue, Dec 22, 2015 at 6:06 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:
>>> I’m surprised with the difference of speed between DV and stream, the same 
>>> query (aggregate 7M unique keys) with stream method takes 21s and with DV 
>>> is about 3 minutes ...
>> Wow - is this a "real" DV field, or one that was built on-demand in
>> the FieldCache?  Were those times for the first request, or subsequent
>> requests?
>> What are the characteristics of that field... i.e. how many unique
>> values in the shard (local index being queried) and how many typical
>> values per field?
>> And how many docs total on the shard?
>> -Yonik

Json facet api method stream

2015-12-21 Thread Yago Riveiro
Hi,

The json facet API method "stream" uses the docvalues internally for do the
aggregation on the fly?

I wan't to know if using this method justifies have the docvalues configured
in schema.



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Json-facet-api-method-stream-tp4246520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Nested document query with wrong numFound value

2015-12-11 Thread Yago Riveiro
Hi,

I'm playing with the nested documents feature and after run this query:

http://localhost:8983/solr/ecommerce-15/query?q=id:3181426982318142698228*

The documents has the IDs:

- Parent :  3181426982318142698228
- Child_1 : 31814269823181426982280
- Child_2 : 31814269823181426982281


I have this return:

{
responseHeader: {
status: 0,
QTime: 3,
params: {
q: "id:3181426982318142698228*"
}
},
response: {
numFound: 3,
start: 0,
maxScore: 1,
docs: [{
id: "31814269823181426982280",
child_type: "ecommerce_product",
qty: 1,
product_price: 49.99
}, {
id: "31814269823181426982281",
child_type: "ecommerce_product",
qty: 1,
product_price: 139.9
}]
}
}

As you can see the numFound is 3, and I have only 2 child documents, it's
not supposed to ignore the parent document?



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nested-document-query-with-wrong-numFound-value-tp4244851.html
Sent from the Solr - User mailing list archive at Nabble.com.


Schema API, change the defaultoperator

2015-12-11 Thread Yago Riveiro
Hi,

How can I change the defaultoperator parameter through the schema API?

Thanks.



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-API-change-the-defaultoperator-tp4244857.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Nested document query with wrong numFound value

2015-12-11 Thread Yago Riveiro
hlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0};

}

},

GET_FIELDS: {


http: //node-01:8983/solr/ecommerce-15_shard1_replica2/: {


QTime: "3",


ElapsedTime: "4",


RequestPurpose: "GET_FIELDS,GET_DEBUG",


NumFound: "2",


Response: 
"{responseHeader={status=0,QTime=3,params={df=_text_,distrib=false,debug=[timing,
 
track],qt=/query,shards.purpose=320,shard.url=http://node-01:8983/solr/ecommerce-15_shard1_replica2/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,version=2,q=id:3181426982318142698228*,requestPurpose=GET_FIELDS,GET_DEBUG,NOW=1449842438070,ids=31814269823181426982281,31814269823181426982280,isShard=true,wt=javabin,debugQuery=true}},response={numFound=2,start=0,docs=[SolrDocument{id=31814269823181426982281,
 child_type=ecommerce_product, ref=5545562, name=smartphone vodafone samsung 
galaxy core prime branco, name_raw=Smartphone VODAFONE SAMSUNG Galaxy Core 
Prime Branco, cat=vodafone, cat_raw=Vodafone, qty=1, product_price=139.9}, 
SolrDocument{id=31814269823181426982280, child_type=ecommerce_product, 
ref=5439705, name=liquidificadora moulinex faciclic glass lm310e1, 
name_raw=Liquidificadora MOULINEX Faciclic Glass LM310E1, cat=liquidificadores, 
cat_raw=Liquidificadores, qty=1, 
product_price=49.99}]},debug={rawquerystring=id:3181426982318142698228*,querystring=id:3181426982318142698228*,parsedquery=id:3181426982318142698228*,parsedquery_toString=id:3181426982318142698228*,explain={31814269823181426982281=
 1.0 = id:3181426982318142698228*, product of: 1.0 = boost 1.0 = queryNorm 
,31814269823181426982280= 1.0 = id:3181426982318142698228*, product of: 1.0 = 
boost 1.0 = queryNorm 
},QParser=LuceneQParser,timing={time=3.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}},process={time=3.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=3.0}"

}

}

},

timing: {


time: 3,


prepare: {


time: 0,


query: {


time: 0


},


facet: {


time: 0


},


facet_module: {


time: 0


},


mlt: {


time: 0


},


highlight: {


time: 0


},


stats: {


time: 0


},


expand: {


time: 0


},


debug: {


time: 0


}


},


process: {


time: 3,


query: {


time: 0


},


facet: {


time: 0


},


facet_module: {


time: 0


},


mlt: {


time: 0


},


highlight: {


time: 0


},


stats: {


time: 0


},


expand: {


time: 0


},


    debug: {


time: 3


}


}


},


rawquerystring: "id:3181426982318142698228*",


    querystring: "id:3181426982318142698228*",


parsedquery: "id:3181426982318142698228*",


parsedquery_toString: "id:3181426982318142698228*",


QParser: "LuceneQParser",


explain: {


31814269823181426982280: " 1.0 = id:3181426982318142698228*, 
product of: 1.0 = boost 1.0 = queryNorm ",


31814269823181426982281: " 1.0 = id:3181426982318142698228*, 
product of: 1.0 = boost 1.0 = queryNorm "


}

}

}




—/Yago Riveiro

On Fri, Dec 11, 2015 at 12:53 PM, Mikhail Khludnev
<mkhlud...@griddynamics.com> wrote:

> what do you see with debugQuery=true  ?
> On Fri, Dec 11, 2015 at 2:02 PM, Yago Riveiro <yago.rive...@gmail.com>
> wrote:
>> Hi,
>>
>> I'm playing with t

How Json facet API works with domains and facet functions?

2015-12-11 Thread Yago Riveiro
Hi,

How the json facet api works with domains and facet functions?

I try to google some info and I do not find nothing useful.

How can do a query that find all parents that match a clause (a date) and
calculate the avg price of all of children that have property X?

Following yonik's blog example I try something like this:

http://localhost:8983/solr/query?q={!parent
which="parent_type:ecommerce"}date:2015-12-11T00:00:00Z={x:'avg(price)',
domain: { blockChildren : "parent_type:ecommerce"}} 

but doesn't work.



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-Json-facet-API-works-with-domains-and-facet-functions-tp4244907.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Schema API, change the defaultoperator

2015-12-11 Thread Yago Riveiro
I uploaded a schema.xml manualy with the defaultoperator configuration and it's 
working.




My problem is that my legacy application is huge and I can't go to all places 
to add the q.op parameter.




The solrconfig.xml option should be an option. The q.op param defined in 
request handlers works with POST http calls?

On Fri, Dec 11, 2015 at 2:26 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 12/11/2015 4:23 AM, Yago Riveiro wrote:
>> How can I change the defaultoperator parameter through the schema API?
> The default operator and default field settings in the schema have been
> deprecated for quite some time, so I would imagine that you can't change
> them with the schema API -- they shouldn't be there, so there's no need
> to support the ability to change them.
> Look into the q.op and df parameters, which can be defined in the
> request handler definition (solrconfig.xml) or passed in with the query.
> https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-StandardQueryParserParameters
> Thanks,
> Shawn

Re: Nested document query with wrong numFound value

2015-12-11 Thread Yago Riveiro
Mmmm,


In fact, if I running a json facet query the result count is 5 for both of 
them, this is consistent with the debug query.




What I don't understand is from where these documents are.




I pre-clean the colection several time with a delete query (id:*) and index 
always  31814269823181426982280 and 31814269823181426982281 as a children of  
3181426982318142698228


 

Can this issue be related to SOLR-5211?.


—/Yago Riveiro

On Fri, Dec 11, 2015 at 8:46 PM, Mikhail Khludnev
<mkhlud...@griddynamics.com> wrote:

> On Fri, Dec 11, 2015 at 11:05 PM, Yago Riveiro <yago.rive...@gmail.com>
> wrote:
>> When do you say that I have duplicates, what do you mean?
>>
> I mean
> http: //node-01:8983/solr/ecommerce-15_shard1_replica2/: {
> QTime: "0",
> ElapsedTime: "2",
> RequestPurpose: "GET_TOP_IDS",
> NumFound: "11",
> Response:
> "{responseHeader={status=0,QTime=0,params={df=_text_,distrib=false,debug=[false,
> timing, track],qt=/query,fl=[id,
> score],shards.purpose=4,start=0,fsv=true,shard.url=
> http://node-01:8983/solr/ecommerce-15_shard1_replica2/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,rows=10,version=2,q=id:3181426982318142698228*,requestPurpose=GET_TOP_IDS,NOW=1449842438070,isShard=true,wt=javabin,debugQuery=false}
> },response={numFound=11,start=0,maxScore=1.0,*docs=[**SolrDocument{id=**31814269823181426982280,
> score=1.0}, SolrDocument{id=**31814269823181426982280, score=1.0},
> SolrDocument{id=**31814269823181426982280, score=1.0},
> SolrDocument{id=**31814269823181426982280,
> score=1.0}, SolrDocument{id=**31814269823181426982280, score=1.0},
> SolrDocument{id=**31814269823181426982281, score=1.0},
> SolrDocument{id=**31814269823181426982281,
> score=1.0}, SolrDocument{id=**31814269823181426982281, score=1.0},
> SolrDocument{id=**31814269823181426982281, score=1.0},
> SolrDocument{id=**31814269823181426982281,
> score=1.0}]}*
> ,sort_values={},debug={timing={time=0.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}},process={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}"
> Perhaps, it's worth to verify shards one by one sending requests with
> distrib=false.
>>
>>
>> If I have duplicate documents is not intentional, each document must be
>> unique.
>>
>>
>> Running a query for each id:
>>
>>
>>
>>
>>
>> - Parent :  3181426982318142698228
>>
>> - Child_1 : 31814269823181426982280
>>
>> - Child_2 : 31814269823181426982281
>>
>>
>>
>>
>> The result is one document for each …
>>
>>
>>
>>
>>
>> responseHeader:
>> {
>> status: 0,
>>
>>
>>
>> QTime: 3,
>>
>>
>>
>> params:
>> {
>> q: "id:3181426982318142698228",
>>
>>
>>
>> fl: "id",
>>
>>
>>
>> q.op: "AND"
>>
>>
>>
>> }
>>
>>
>> },
>>
>>
>>
>> response:
>> {
>> numFound: 1,
>>
>>
>>
>> start: 0,
>>
>>
>>
>> maxScore: 11.017976,
>>
>>
>>
>> docs:
>> [
>>
>> {
>> id: "3181426982318142698228"
>>
>>
>> }
>>
>> ]
>>
>>
>> }
>>
>>
>>
>>
>>
>>
>>
>> responseHeader:
>> {
>> status: 0,
>>
>>
>>
>> QTime: 3,
>>
>>
>>
>> params:
>> {
>> q: "id:31814269823181426982280",
>>
>>
>>
>> fl: "id",
>>
>>
>>
>> q.op: "AND"
>>
>>
>>
>> }
>>
>>
>> },
>>
>>
>>
>> response:
>> {
>> numFound: 1,
>>
>>
>>
>> start: 0,
>>
>>
>>
>> maxScore: 9.919363,
>>
>>
>>
>> docs:
>> [
>>
>> {
>> id: "31814269823181426982280"
>>
>>
>> }
>>
>> ]
>>
>>
>> }
>>
>>
>>
>>
>>
>>
>> responseHeader:
>> {
>> status: 0,
>>
>>
>>
>> QTime: 3,
>>
>>
>>
>> params:
>> {
>> q: "id:318142698231814269

Re: Nested document query with wrong numFound value

2015-12-11 Thread Yago Riveiro
When do you say that I have duplicates, what do you mean? 


If I have duplicate documents is not intentional, each document must be unique.


Running a query for each id:





- Parent :  3181426982318142698228

- Child_1 : 31814269823181426982280

- Child_2 : 31814269823181426982281




The result is one document for each …





responseHeader: 
{
status: 0,



QTime: 3,



params: 
{
q: "id:3181426982318142698228",



fl: "id",



q.op: "AND"



}


},



response: 
{
numFound: 1,



start: 0,



maxScore: 11.017976,



docs: 
[

{
id: "3181426982318142698228"


}

]


}







responseHeader: 
{
status: 0,



QTime: 3,



params: 
{
q: "id:31814269823181426982280",



fl: "id",



q.op: "AND"



}


},



response: 
{
numFound: 1,



start: 0,



maxScore: 9.919363,



docs: 
[

{
id: "31814269823181426982280"


}

]


}






responseHeader: 
{
status: 0,



QTime: 3,



params: 
{
q: "id:31814269823181426982281",



fl: "id",



q.op: "AND"



}


},



response: 
{
numFound: 1,



start: 0,



maxScore: 9.919363,



docs: 
[

{
id: "31814269823181426982281"


}

]


}










—/Yago Riveiro





Ok. I got it. SolrCloud relies on uniqueKey (id) for merging shard results,

but in your examples it doesn't work, because nested documents disables

this. And you have duplicates, which make merge heap mad:


false}

<http://node-01:8983/solr/ecommerce-15_shard1_replica2/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,rows=10,version=2,q=id:3181426982318142698228*,requestPurpose=GET_TOP_IDS,NOW=1449842438070,isShard=true,wt=javabin,debugQuery=false%7D>},response={numFound=11,start=0,maxScore=1.0,docs=[SolrDocument{id=31814269823181426982280,

score=1.0}, SolrDocument{id=31814269823181426982280, score=1.0},

SolrDocument{id=31814269823181426982280, score=1.0},

SolrDocument{id=31814269823181426982280, score=1.0},

SolrDocument{id=31814269823181426982280, score=1.0},

SolrDocument{id=31814269823181426982281, score=1.0},

SolrDocument{id=31814269823181426982281, score=1.0},

SolrDocument{id=31814269823181426982281, score=1.0},

SolrDocument{id=31814269823181426982281, score=1.0},


Yago, you encounter a quite curious fact. Congratulation!

You can only retrieve parent document with SolrCloud, hence use {!parent

..}.. of fq=type:parent.


ccing Devs:

Shouldn't it prosecute ID dupes explicitly? Is it a known feature?



On Fri, Dec 11, 2015 at 5:08 PM, Yago Riveiro <yago.rive...@gmail.com>

wrote:


> This:

>

>

>

>

>

> {

>

>

> responseHeader: {

>

>

> status: 0,

>

>

> QTime: 10,

>

>

> params: {

>

>

> q: "id:3181426982318142698228*",

>

>

> debugQuery: "true"

>

>

> }

>

>

> },

>

>

> response: {

>

>

> numFound: 3,

>

>

> start: 0,

>

>

> maxScore: 1,

>

>

> docs: [{

>

>

> id: "31814269823181426982280",

>

>

> child_type: "ecommerce_product",

>

>

> qty: 1,

>

>

> product_price: 49.99

>

>

> }, {

>

>

> id: "31814269823181426982281",

>

>

> child_type: "ecommerce_product",

>

>

> qty: 1,

>

>

> product_price: 139.9

>

>

> }]

>

>

> },

>

>

> debug: {

>

>

> track: {

>

>

> rid:

> "node-01-ecommerce-15_shard1_replica2-1449842438070-0",

>

>

> EXECUTE_QUERY: {

>

>

> http:

> //node-17:8983/solr/ecommerce-15_shard2_replica1/: {

>

>

> QTime: "0",

>

>

> ElapsedTime: "2",

>

>

> RequestPurpose: "GET_TOP_IDS",

>

>

> NumFound: "0",

>

>

> Response:

> "{responseHeader={status=0,QTime=0,params={df=_text_,distrib=false,debug=[false,

> timing, track],qt=/query,fl=[id,

> score],shards.purpose=4,start=0,fsv=true,shard.url=

> http://node-17:8983/solr/ecommerce-15_shard2_replica1/,rid=node-01-ecommerce

Re: How Json facet API works with domains and facet functions?

2015-12-11 Thread Yago Riveiro
One more question.


It’s posisble use the domain clause in json facet without a term query?




Ex.





json.facet={

    x:'avg(price)',

   domain: { blockChildren : "parent_type:ecommerce”}

}




This make any sense, or I always should reduce the domain using the query and 
filters.




—/Yago Riveiro

On Fri, Dec 11, 2015 at 5:17 PM, Yonik Seeley <ysee...@gmail.com> wrote:

> If you search on the parents and want to match child documents, I
> think you want {!child} and not {!parent} in your queries or filters.
> fq={!child of=...}date_query_on_parents
> fq=child_prop:X
> For this specific example, you don't even need the block-join support
> in facets since the base domain (query+filters) will already be the
> child docs you want to facet over.
> -Yonik
> On Fri, Dec 11, 2015 at 11:46 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:
>> Hi,
>>
>> How the json facet api works with domains and facet functions?
>>
>> I try to google some info and I do not find nothing useful.
>>
>> How can do a query that find all parents that match a clause (a date) and
>> calculate the avg price of all of children that have property X?
>>
>> Following yonik's blog example I try something like this:
>>
>> http://localhost:8983/solr/query?q={!parent
>> which="parent_type:ecommerce"}date:2015-12-11T00:00:00Z={x:'avg(price)',
>> domain: { blockChildren : "parent_type:ecommerce"}}
>>
>> but doesn't work.
>>
>>
>>
>> -
>> Best regards
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/How-Json-facet-API-works-with-domains-and-facet-functions-tp4244907.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Json facet api NullPointerException

2015-11-12 Thread Yago Riveiro
Hi,

I'm hitting this NullPointerException using the json facet API.

Same query using Facet component is working.

Json facet query:

curl -s http://node1:8983/solr/metrics/query -d
'q=datetime:[2015-10-01T00:00:00Z TO
2015-10-04T23:59:59Z]=0={
urls: {
type: terms,
field: url,
limit: -1,
sort: index,
numBuckets: true
}}'

Facet component query:

http://node1:8983/solr/metrics/query?q=datetime:[2015-10-01T00:00:00Z%20TO%202015-10-04T23:59:59Z]=true=url=1=-1=0=json=1=index

Total elements returned: 1971203
Total unique elements returned: 307570

Json facet api response:

2015-11-12 15:29:53.130 ERROR (qtp1510067370-34151) [c:metrics:shard1
r:core_node5 x:metrics_shard1_replica2] o.a.s.s.SolrDispatchFilter
null:java.lang.NullPointerException
at
org.apache.solr.search.facet.FacetFieldProcessorFCBase$1.lessThan(FacetField.java:573)
at
org.apache.solr.search.facet.FacetFieldProcessorFCBase$1.lessThan(FacetField.java:570)
at
org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:258)
at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:135)
at
org.apache.solr.search.facet.FacetFieldProcessorFCBase.findTopSlots(FacetField.java:603)
at
org.apache.solr.search.facet.FacetFieldProcessorFCBase.getFieldCacheCounts(FacetField.java:547)
at
org.apache.solr.search.facet.FacetFieldProcessorFCBase.process(FacetField.java:512)
at
org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:222)
at
org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:313)
at
org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:57)
at
org.apache.solr.search.facet.FacetModule.process(FacetModule.java:87)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Json-facet-api-NullPointerException-tp4239900.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Json facet api NullPointerException

2015-11-12 Thread Yago Riveiro
Solr 5.3.1


—/Yago Riveiro

On Thu, Nov 12, 2015 at 4:21 PM, Yonik Seeley <ysee...@gmail.com> wrote:

> Thanks for the report Yago,
> What version is this?
> -Yonik
> On Thu, Nov 12, 2015 at 10:53 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:
>> Hi,
>>
>> I'm hitting this NullPointerException using the json facet API.
>>
>> Same query using Facet component is working.
>>
>> Json facet query:
>>
>> curl -s http://node1:8983/solr/metrics/query -d
>> 'q=datetime:[2015-10-01T00:00:00Z TO
>> 2015-10-04T23:59:59Z]=0={
>> urls: {
>> type: terms,
>> field: url,
>> limit: -1,
>> sort: index,
>> numBuckets: true
>> }}'
>>
>> Facet component query:
>>
>> http://node1:8983/solr/metrics/query?q=datetime:[2015-10-01T00:00:00Z%20TO%202015-10-04T23:59:59Z]=true=url=1=-1=0=json=1=index
>>
>> Total elements returned: 1971203
>> Total unique elements returned: 307570
>>
>> Json facet api response:
>>
>> 2015-11-12 15:29:53.130 ERROR (qtp1510067370-34151) [c:metrics:shard1
>> r:core_node5 x:metrics_shard1_replica2] o.a.s.s.SolrDispatchFilter
>> null:java.lang.NullPointerException
>> at
>> org.apache.solr.search.facet.FacetFieldProcessorFCBase$1.lessThan(FacetField.java:573)
>> at
>> org.apache.solr.search.facet.FacetFieldProcessorFCBase$1.lessThan(FacetField.java:570)
>> at
>> org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:258)
>> at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:135)
>> at
>> org.apache.solr.search.facet.FacetFieldProcessorFCBase.findTopSlots(FacetField.java:603)
>> at
>> org.apache.solr.search.facet.FacetFieldProcessorFCBase.getFieldCacheCounts(FacetField.java:547)
>> at
>> org.apache.solr.search.facet.FacetFieldProcessorFCBase.process(FacetField.java:512)
>> at
>> org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:222)
>> at
>> org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:313)
>> at
>> org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:57)
>> at
>> org.apache.solr.search.facet.FacetModule.process(FacetModule.java:87)
>> at
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)
>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
>> at
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>> at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>> at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>> at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>> at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>> at org.eclipse.jetty.server.Server.handle(Server.java:499)
>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>> at
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>> at
>> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> -
>> Best regards
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Json-facet-api-NullPointerException-tp4239900.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Json facet api NullPointerException

2015-11-12 Thread Yago Riveiro
I found the bug …




In my query I have 




sort: index,




And should be




sort:{index:desc|asc}




I think that the json parser should raise a “json parsing error” ...


—/Yago Riveiro

On Thu, Nov 12, 2015 at 4:44 PM, Yago Riveiro <yago.rive...@gmail.com>
wrote:

> Solr 5.3.1
> —/Yago Riveiro
> On Thu, Nov 12, 2015 at 4:21 PM, Yonik Seeley <ysee...@gmail.com> wrote:
>> Thanks for the report Yago,
>> What version is this?
>> -Yonik
>> On Thu, Nov 12, 2015 at 10:53 AM, Yago Riveiro <yago.rive...@gmail.com> 
>> wrote:
>>> Hi,
>>>
>>> I'm hitting this NullPointerException using the json facet API.
>>>
>>> Same query using Facet component is working.
>>>
>>> Json facet query:
>>>
>>> curl -s http://node1:8983/solr/metrics/query -d
>>> 'q=datetime:[2015-10-01T00:00:00Z TO
>>> 2015-10-04T23:59:59Z]=0={
>>> urls: {
>>> type: terms,
>>> field: url,
>>> limit: -1,
>>> sort: index,
>>> numBuckets: true
>>> }}'
>>>
>>> Facet component query:
>>>
>>> http://node1:8983/solr/metrics/query?q=datetime:[2015-10-01T00:00:00Z%20TO%202015-10-04T23:59:59Z]=true=url=1=-1=0=json=1=index
>>>
>>> Total elements returned: 1971203
>>> Total unique elements returned: 307570
>>>
>>> Json facet api response:
>>>
>>> 2015-11-12 15:29:53.130 ERROR (qtp1510067370-34151) [c:metrics:shard1
>>> r:core_node5 x:metrics_shard1_replica2] o.a.s.s.SolrDispatchFilter
>>> null:java.lang.NullPointerException
>>> at
>>> org.apache.solr.search.facet.FacetFieldProcessorFCBase$1.lessThan(FacetField.java:573)
>>> at
>>> org.apache.solr.search.facet.FacetFieldProcessorFCBase$1.lessThan(FacetField.java:570)
>>> at
>>> org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:258)
>>> at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:135)
>>> at
>>> org.apache.solr.search.facet.FacetFieldProcessorFCBase.findTopSlots(FacetField.java:603)
>>> at
>>> org.apache.solr.search.facet.FacetFieldProcessorFCBase.getFieldCacheCounts(FacetField.java:547)
>>> at
>>> org.apache.solr.search.facet.FacetFieldProcessorFCBase.process(FacetField.java:512)
>>> at
>>> org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:222)
>>> at
>>> org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:313)
>>> at
>>> org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:57)
>>> at
>>> org.apache.solr.search.facet.FacetModule.process(FacetModule.java:87)
>>> at
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)
>>> at
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
>>> at
>>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
>>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
>>> at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
>>> at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>>> at
>>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>>> at
>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>>> at
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>>> at
>>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>>> at
>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>>> at
>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>>> at
>>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>>> at
>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>>> at
>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>>> at
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>>> 

Re: The time that init.d script waits before shutdown should be configurable

2015-11-10 Thread Yago Riveiro
Patch attached to https://issues.apache.org/jira/browse/SOLR-8065





The windows script is voodo for me :D, I haven’t the knowledge to port this to 
cmd script.


—/Yago Riveiro

On Mon, Nov 9, 2015 at 3:23 PM, Upayavira <u...@odoko.co.uk> wrote:

> Yago,
> I think a JIRA has been raised for this. I'd encourage you to hunt it
> down and make a patch.
> Upayavira
> On Mon, Nov 9, 2015, at 03:09 PM, Yago Riveiro wrote:
>> The time that init.d script waits before shutdown should be configurable
>> 
>> The 5 seconds is not enough to all my shards notify the shutdown and the
>> process ends with a kill command
>> 
>> I think that in solr.in.sh should exists a entry to configure the time to
>> wait before use a kill command
>> 
>> 
>> 
>> -
>> Best regards
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/The-time-that-init-d-script-waits-before-shutdown-should-be-configurable-tp4239143.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

  1   2   3   >