Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread Michael Joyner
That is the way we do it here - also helps a lot with not needing x2 or 
x3 disk space to handle the merge:


public void solrOptimize() {
        int initialMaxSegments = 256;
        int finalMaxSegments = 4;
        if (isShowSegmentCounter()) {
            log.info("Optimizing ...");
        }
        try (SolrClient solrServerInstance = getSolrClientInstance()) {
            for (int segments = initialMaxSegments; segments >= 
finalMaxSegments; segments--) {

                if (isShowSegmentCounter()) {
                    System.out.println("Optimizing to a max of " + 
segments + " segments.");

                }
                try {
                    solrServerInstance.optimize(true, true, segments);
                } catch (RemoteSolrException | SolrServerException | 
IOException e) {

                    log.severe(e.getMessage());
                }
            }
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

On 6/7/19 4:56 AM, Nicolas Franck wrote:

In that case, hard optimisation like that is out the question.
Resort to automatic merge policies, specifying a maximum
amount of segments. Solr is created with multiple segments
in mind. Hard optimisation seems like not worth the problem.

The problem is this: the less segments you specify during
during an optimisation, the longer it will take, because it has to read
all of these segments to be merged, and redo the sorting. And a cluster
has a lot of housekeeping on top of it.

If you really want to issue a optimisation, then you can
also do it in steps (max segments parameter)

10 -> 9 -> 8 -> 7 .. -> 1

that way less segments need to be merged in one go.

testing your index will show you what a good maximum
amount of segments is for your index.


On 7 Jun 2019, at 07:27, jena  wrote:

Hello guys,

We have 4 solr(version 4.4) instance on production environment, which are
linked/associated with zookeeper for replication. We do heavy deleted & add
operations. We have around 26million records and the index size is around
70GB. We serve 100k+ requests per day.


Because of heavy indexing & deletion, we optimise solr instance everyday,
because of that our solr cloud getting unstable , every solr instance go on
recovery mode & our search is getting affected & very slow because of that.
Optimisation takes around 1hr 30minutes.
We are not able fix this issue, please help.

Thanks & Regards



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Re: SolrJ does not use HTTP proxy anymore in 7.5.0 after update from 6.6.5

2018-10-12 Thread Michael Joyner
Would you supply the snippet for the custom HttpClient to get it to 
honor/use proxy?


Thanks!

On 10/10/2018 10:50 AM, Andreas Hubold wrote:
Thank you, Shawn. I'm now using a custom HttpClient that I create in a 
similar manner as SolrJ, and it works quite well.


Of course, a fix in a future release would be great, so that we can 
remove the workaround eventually.


Thanks,
Andreas

Shawn Heisey schrieb am 10.10.2018 um 16:31:

On 10/1/2018 6:54 AM, Andreas Hubold wrote:
Is there some other way to configure an HTTP proxy, e.g. with 
HttpSolrClient.Builder? I don't want to create an Apache HttpClient 
instance myself but the builder from Solrj (HttpSolrClient.Builder).


Unless you want to wait for a fix for SOLR-12848, you have two options:

1) Use a SolrJ client from 6.6.x, before the fix for SOLR-4509. If 
you're using HttpSolrClient rather than CloudSolrClient, a SolrJ 
major version that's different than your Solr major version won't be 
a big problem.  Large version discrepancies can be very problematic 
with the Cloud client.


2) Create a custom HttpClient instance with the configuration you 
want and use that to build your SolrClient instances.  If you're 
using the Solr client in a multi-threaded manner, you'll want to be 
sure that the HttpClient is defined to allow enough threads -- it 
defaults to two.


I do think this particular problem is something we should fix. But 
that doesn't help you in the short term.  It could take several weeks 
(or maybe longer) for a fix from us to arrive in your hands, unless 
you're willing to compile from source.


Thanks,
Shawn








Is there an easy way to compare schemas?

2018-09-24 Thread Michael Joyner

Is there an easy way to compare schemas?

When upgrading nodes, we are wanting to compare the "core" and 
"automatically mapped" data types between our existing schema and the 
new manage-schema available as part of the upgraded distrubtion.


Re: Solr Cloud 7.3.1 backups (autofs/NFS)

2018-06-01 Thread Michael Joyner
It simply auto mounts NFS mount points under /net/$host/$export so is no 
different than having manually mounted NFS mount points for the purposes 
of backing up.


Just be sure that your NFS host is set to export the appropriate file 
system location with the needed net mask so that your various nodes have 
permission to mount the exports with the appropriate r/w permissions. So 
long as your various nodes (on demand, other) have the autofs installed 
and enabled, simply accessing the appropriate /net/$host/$export folder 
will automount the export when needed. (You only need autofs installed 
and enabled on nodes, only the NFS host needs any real configuration 
this way)



On 05/31/2018 05:28 PM, Greg Roodt wrote:

Thanks! I wasn't aware this existed.

Have you used it with Solr backups?




On Fri, 1 Jun 2018 at 00:07, Michael Joyner <mailto:mich...@newsrx.com>> wrote:




On 05/30/2018 05:16 PM, Greg Roodt wrote:

It's going to take a bit of coordination to get all nodes to mount a shared
volume when we take a backup and then unmount when done.



autofs should help with that.





Re: Shard size variation

2018-05-03 Thread Michael Joyner
We generally try not to change defaults when possible, sounds like there 
will be new default settings for the segment sizes and merging policy?


Am I right in thinking that expungeDeletes will (in theory) be a 7.4 
forwards option?



On 05/02/2018 01:29 PM, Erick Erickson wrote:

You can always increase the maximum segment size. For large indexes
that should reduce the number of segments. But watch your indexing
stats, I can't predict the consequences of bumping it to 100G for
instance. I'd _expect_  bursty I/O whne those large segments started
to be created or merged

You'll be interested in LUCENE-7976 (Solr 7.4?), especially (probably)
the idea of increasing the segment sizes and/or a related JIRA that
allows you to tweak how aggressively solr merges segments that have
deleted docs.

NOTE: that JIRA has the consequence that _by default_ the optimize
with no parameters respects the maximum segment size, which is a
change from now.

Finally, expungeDeletes may be useful as that too will respect max
segment size, again after LUCENE-7976 is committed.

Best,
Erick

On Wed, May 2, 2018 at 9:22 AM, Michael Joyner <mich...@newsrx.com> wrote:

The main reason we go this route is that after awhile (with default
settings) we end up with hundreds of shards and performance of course drops
abysmally as a result. By using a stepped optimize a) we don't run into the
we need the 3x+ head room issue, b) optimize performance penalty during
optimize is less than the hundreds of shards not being optimized performance
penalty.

BTW, as we use batched a batch insert/update cycle [once daily] we only do
optimize to a segment of 1 after a complete batch has been run. Though
during the batch we reduce segment counts down to a max of 16 every 250K
insert/updates to prevent the large segment count performance penalty.


On 04/30/2018 07:10 PM, Erick Erickson wrote:

There's really no good way to purge deleted documents from the index
other than to wait until merging happens.

Optimize/forceMerge and expungeDeletes both suffer from the problem
that they create massive segments that then stick around for a very
long time, see:

https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Best,
Erick

On Mon, Apr 30, 2018 at 1:56 PM, Michael Joyner <mich...@newsrx.com>
wrote:

Based on experience, 2x head room is room is not always enough, sometimes
not even 3x, if you are optimizing from many segments down to 1 segment
in a
single go.

We have however figured out a way that can work with as little as 51%
free
space via the following iteration cycle:

public void solrOptimize() {
  int initialMaxSegments = 256;
  int finalMaxSegments = 1;
  if (isShowSegmentCounter()) {
  log.info("Optimizing ...");
  }
  try (SolrClient solrServerInstance = getSolrClientInstance()){
  for (int segments=initialMaxSegments;
segments>=finalMaxSegments; segments--) {
  if (isShowSegmentCounter()) {
  System.out.println("Optimizing to a max of
"+segments+"
segments.");
  }
  solrServerInstance.optimize(true, true, segments);
  }
  } catch (SolrServerException | IOException e) {
  throw new RuntimeException(e);

  }
  }


On 04/30/2018 04:23 PM, Walter Underwood wrote:

You need 2X the minimum index size in disk space anyway, so don’t worry
about keeping the indexes as small as possible. Worry about having
enough
headroom.

If your indexes are 250 GB, you need 250 GB of free space.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Apr 30, 2018, at 1:13 PM, Antony A <antonyaugus...@gmail.com> wrote:

Thanks Erick/Deepak.

The cloud is running on baremetal (128 GB/24 cpu).

Is there an option to run a compact on the data files to make the size
equal on both the clouds? I am trying find all the options before I add
the
new fields into the production cloud.

Thanks
AA

On Mon, Apr 30, 2018 at 10:45 AM, Erick Erickson
<erickerick...@gmail.com>
wrote:


Anthony:

You are probably seeing the results of removing deleted documents from
the shards as they're merged. Even on replicas in the same _shard_,
the size of the index on disk won't necessarily be identical. This has
to do with which segments are selected for merging, which are not
necessarily coordinated across replicas.

The test is if the number of docs on each collection is the same. If
it is, then don't worry about index sizes.

Best,
Erick

On Mon, Apr 30, 2018 at 9:38 AM, Deepak Goel <deic...@gmail.com>
wrote:

Could you please also give the machine details of the two clouds you
are
running?



Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https

Re: Shard size variation

2018-05-02 Thread Michael Joyner
The main reason we go this route is that after awhile (with default 
settings) we end up with hundreds of shards and performance of course 
drops abysmally as a result. By using a stepped optimize a) we don't run 
into the we need the 3x+ head room issue, b) optimize performance 
penalty during optimize is less than the hundreds of shards not being 
optimized performance penalty.


BTW, as we use batched a batch insert/update cycle [once daily] we only 
do optimize to a segment of 1 after a complete batch has been run. 
Though during the batch we reduce segment counts down to a max of 16 
every 250K insert/updates to prevent the large segment count performance 
penalty.



On 04/30/2018 07:10 PM, Erick Erickson wrote:

There's really no good way to purge deleted documents from the index
other than to wait until merging happens.

Optimize/forceMerge and expungeDeletes both suffer from the problem
that they create massive segments that then stick around for a very
long time, see:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Best,
Erick

On Mon, Apr 30, 2018 at 1:56 PM, Michael Joyner <mich...@newsrx.com> wrote:

Based on experience, 2x head room is room is not always enough, sometimes
not even 3x, if you are optimizing from many segments down to 1 segment in a
single go.

We have however figured out a way that can work with as little as 51% free
space via the following iteration cycle:

public void solrOptimize() {
 int initialMaxSegments = 256;
 int finalMaxSegments = 1;
 if (isShowSegmentCounter()) {
 log.info("Optimizing ...");
 }
 try (SolrClient solrServerInstance = getSolrClientInstance()){
 for (int segments=initialMaxSegments;
segments>=finalMaxSegments; segments--) {
 if (isShowSegmentCounter()) {
 System.out.println("Optimizing to a max of "+segments+"
segments.");
 }
 solrServerInstance.optimize(true, true, segments);
 }
 } catch (SolrServerException | IOException e) {
 throw new RuntimeException(e);

 }
 }


On 04/30/2018 04:23 PM, Walter Underwood wrote:

You need 2X the minimum index size in disk space anyway, so don’t worry
about keeping the indexes as small as possible. Worry about having enough
headroom.

If your indexes are 250 GB, you need 250 GB of free space.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Apr 30, 2018, at 1:13 PM, Antony A <antonyaugus...@gmail.com> wrote:

Thanks Erick/Deepak.

The cloud is running on baremetal (128 GB/24 cpu).

Is there an option to run a compact on the data files to make the size
equal on both the clouds? I am trying find all the options before I add
the
new fields into the production cloud.

Thanks
AA

On Mon, Apr 30, 2018 at 10:45 AM, Erick Erickson
<erickerick...@gmail.com>
wrote:


Anthony:

You are probably seeing the results of removing deleted documents from
the shards as they're merged. Even on replicas in the same _shard_,
the size of the index on disk won't necessarily be identical. This has
to do with which segments are selected for merging, which are not
necessarily coordinated across replicas.

The test is if the number of docs on each collection is the same. If
it is, then don't worry about index sizes.

Best,
Erick

On Mon, Apr 30, 2018 at 9:38 AM, Deepak Goel <deic...@gmail.com> wrote:

Could you please also give the machine details of the two clouds you
are
running?



Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Mon, Apr 30, 2018 at 9:51 PM, Antony A <antonyaugus...@gmail.com>

wrote:

Hi Shawn,

The cloud is running version 6.2.1. with ClassicIndexSchemaFactory

The sum of size from admin UI on all the shards is around 265 G vs 224
G
between the two clouds.

I created the collection using "numShards" so compositeId router.

If you need more information, please let me know.

Thanks
AA

On Mon, Apr 30, 2018 at 10:04 AM, Shawn Heisey <apa...@elyograg.org>
wrote:


On 4/30/2018 9:51 AM, Antony A wrote:


I am running two separate solr clouds. I have 8 shards in each with
a
total
of 300 million documents. Both the clouds are indexing the document

from

the same source/configuration.

I am noticing there is a difference in the size of the collection

between

them. I am planning to add more shards to see if that helps solve
the
issue. Has anyone come across similar issue?


There's no information here about exactly what you are seeing, what

you

are expecting to see, and why you believe that what 

Re: Shard size variation

2018-04-30 Thread Michael Joyner
Based on experience, 2x head room is room is not always enough, 
sometimes not even 3x, if you are optimizing from many segments down to 
1 segment in a single go.


We have however figured out a way that can work with as little as 51% 
free space via the following iteration cycle:


public void solrOptimize() {
        int initialMaxSegments = 256;
        int finalMaxSegments = 1;
        if (isShowSegmentCounter()) {
            log.info("Optimizing ...");
        }
        try (SolrClient solrServerInstance = getSolrClientInstance()){
            for (int segments=initialMaxSegments; 
segments>=finalMaxSegments; segments--) {

                if (isShowSegmentCounter()) {
                    System.out.println("Optimizing to a max of 
"+segments+" segments.");

                }
                solrServerInstance.optimize(true, true, segments);
            }
        } catch (SolrServerException | IOException e) {
            throw new RuntimeException(e);
        }
    }


On 04/30/2018 04:23 PM, Walter Underwood wrote:

You need 2X the minimum index size in disk space anyway, so don’t worry about 
keeping the indexes as small as possible. Worry about having enough headroom.

If your indexes are 250 GB, you need 250 GB of free space.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Apr 30, 2018, at 1:13 PM, Antony A  wrote:

Thanks Erick/Deepak.

The cloud is running on baremetal (128 GB/24 cpu).

Is there an option to run a compact on the data files to make the size
equal on both the clouds? I am trying find all the options before I add the
new fields into the production cloud.

Thanks
AA

On Mon, Apr 30, 2018 at 10:45 AM, Erick Erickson 
wrote:


Anthony:

You are probably seeing the results of removing deleted documents from
the shards as they're merged. Even on replicas in the same _shard_,
the size of the index on disk won't necessarily be identical. This has
to do with which segments are selected for merging, which are not
necessarily coordinated across replicas.

The test is if the number of docs on each collection is the same. If
it is, then don't worry about index sizes.

Best,
Erick

On Mon, Apr 30, 2018 at 9:38 AM, Deepak Goel  wrote:

Could you please also give the machine details of the two clouds you are
running?



Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please stop cruelty to Animals, become a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home

On Mon, Apr 30, 2018 at 9:51 PM, Antony A 

wrote:

Hi Shawn,

The cloud is running version 6.2.1. with ClassicIndexSchemaFactory

The sum of size from admin UI on all the shards is around 265 G vs 224 G
between the two clouds.

I created the collection using "numShards" so compositeId router.

If you need more information, please let me know.

Thanks
AA

On Mon, Apr 30, 2018 at 10:04 AM, Shawn Heisey 
wrote:


On 4/30/2018 9:51 AM, Antony A wrote:


I am running two separate solr clouds. I have 8 shards in each with a
total
of 300 million documents. Both the clouds are indexing the document

from

the same source/configuration.

I am noticing there is a difference in the size of the collection

between

them. I am planning to add more shards to see if that helps solve the
issue. Has anyone come across similar issue?


There's no information here about exactly what you are seeing, what

you

are expecting to see, and why you believe that what you are seeing is

wrong.

You did say that there is "a difference in size".  That is a very

vague

problem description.

FYI, unless a SolrCloud collection is using the implicit router, you
cannot add shards.  And if it *IS* using the implicit router, then you

are

100% in control of document routing -- Solr cannot influence that at

all.

Thanks,
Shawn








Re: Are the entries in managed-schema order dependent?

2017-12-20 Thread Michael Joyner

Thanks!


On 12/20/2017 11:37 AM, Erick Erickson wrote:

The schema is not order dependent, I freely mix-n-match the fieldType,
copyField and field definitions for instance.



On Wed, Dec 20, 2017 at 8:29 AM, Michael Joyner <mich...@newsrx.com> wrote:

Hey all,

I'm wanting to update our managed-schemas to include the latest options
available in the 6.6.2 branch. (point types for one)

I would like to be able to sort them and diff them (production vs dist
supplied) to create a simple patch that can be reviewed, edited if
necessary, and then applied to the production schemas.

I'm thinking this approach would be least human error prone, but, the
schemas would need to be diffable and I can only see this as doable if they
are sorted so that common parts diff out. I only see this approach easily
workable if the entries aren't order dependent. (Presuming I can get all the
various schema settings to fit neatly on single lines...).

Or does there exist a list of schema entries added along different point
releases?

-Mike/NewsRx





Are the entries in managed-schema order dependent?

2017-12-20 Thread Michael Joyner

Hey all,

I'm wanting to update our managed-schemas to include the latest options 
available in the 6.6.2 branch. (point types for one)


I would like to be able to sort them and diff them (production vs dist 
supplied) to create a simple patch that can be reviewed, edited if 
necessary, and then applied to the production schemas.


I'm thinking this approach would be least human error prone, but, the 
schemas would need to be diffable and I can only see this as doable if 
they are sorted so that common parts diff out. I only see this approach 
easily workable if the entries aren't order dependent. (Presuming I can 
get all the various schema settings to fit neatly on single lines...).


Or does there exist a list of schema entries added along different point 
releases?


-Mike/NewsRx



Is there a parsing issue with "OR NOT" or is something else going on? (Solr 6)

2017-10-02 Thread Michael Joyner

Hello all,

What is the difference between the following two queries that causes 
them to give different results? Is there a parsing issue with "OR NOT" 
or is something else going on?


a) ("batman" AND "indiana jones") OR NOT ("cancer") /*only seems to 
match the and clause*/


parsedquery=BoostedQuery(boost(+(+((+((_text_ws:batman)^2.0 | 
(_text_txt:batman)^0.5 | (_text_txt_en_split:batman)^0.1) 
+((_text_ws:"indiana jones")^2.0 | (_text_txt:"indiana jones")^0.5 | 
(_text_txt_en_split:"indiana jone")^0.1)) -(+((_text_ws:cancer)^2.0 | 
(_text_txt:cancer)^0.5 | (_text_txt_en_split:cancer)^0.1


b) ("batman" AND "indiana jones") OR (NOT ("cancer")) /*gives the 
results we expected*/


parsedquery=BoostedQuery(boost(+(+((+((_text_ws:batman)^2.0 | 
(_text_txt:batman)^0.5 | (_text_txt_en_split:batman)^0.1) 
+((_text_ws:"indiana jones")^2.0 | (_text_txt:"indiana jones")^0.5 | 
(_text_txt_en_split:"indiana jone")^0.1)) (-(+((_text_ws:cancer)^2.0 | 
(_text_txt:cancer)^0.5 | (_text_txt_en_split:cancer)^0.1)) +*:*)^1.0))


The first thing I notice is the '+*.*)^1.0' component in the 2nd query's 
parsedquery which is not in the 1st query's parsedquery response. The 
first query does not seem to be matching any of the "NOT" articles to 
include in the union of sets and is not giving us the expected results. 
Is wrapping "NOT" a general requirement when preceded by an operator?


We are using SolrCloud 6.6 and are using q.op=AND with edismax.

Thanks!

-Michael/NewsRx

Full debug outputs:

{rawquerystring={!boost 
b=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1)}{!edismax}(("batman" AND 
"indiana jones") OR NOT ("cancer")), querystring={!boost 
b=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1)}{!edismax}(("batman" AND 
"indiana jones") OR NOT ("cancer")), 
parsedquery=BoostedQuery(boost(+(+((+((_text_ws:batman)^2.0 | 
(_text_txt:batman)^0.5 | (_text_txt_en_split:batman)^0.1) 
+((_text_ws:"indiana jones")^2.0 | (_text_txt:"indiana jones")^0.5 | 
(_text_txt_en_split:"indiana jone")^0.1)) -(+((_text_ws:cancer)^2.0 | 
(_text_txt:cancer)^0.5 | 
(_text_txt_en_split:cancer)^0.1,1.0/(3.16E-11*float(ms(const(150691680),date(issuedate_tdt)))+1.0))), 
parsedquery_toString=boost(+(+((+((_text_ws:batman)^2.0 | 
(_text_txt:batman)^0.5 | (_text_txt_en_split:batman)^0.1) 
+((_text_ws:"indiana jones")^2.0 | (_text_txt:"indiana jones")^0.5 | 
(_text_txt_en_split:"indiana jone")^0.1)) -(+((_text_ws:cancer)^2.0 | 
(_text_txt:cancer)^0.5 | 
(_text_txt_en_split:cancer)^0.1,1.0/(3.16E-11*float(ms(const(150691680),date(issuedate_tdt)))+1.0)), 
QParser=ExtendedDismaxQParser, altquerystring=null, boost_queries=null, 
parsed_boost_queries=[], boostfuncs=null, 
boost_str=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1), 
boost_parsed=org.apache.lucene.queries.function.valuesource.ReciprocalFloatFunction:1.0/(3.16E-11*float(ms(const(150691680),date(issuedate_tdt)))+1.0), 
filter_queries=[issuedate_tdt:[2000\-09\-18T04\:00\:00Z/DAY TO 
2017\-10\-02T04\:00\:00Z/DAY+1DAY}, types_ss:(TrademarkApp OR 
Stockmarket OR AllClinicalTrials OR PressRelease OR Patent OR SEC OR 
Scholarly OR ClinicalTrial)], 
parsed_filter_queries=[+issuedate_tdt:[96924960 TO 150700320}, 
+(types_ss:TrademarkApp types_ss:Stockmarket types_ss:AllClinicalTrials 
types_ss:PressRelease types_ss:Patent types_ss:SEC types_ss:Scholarly 
types_ss:ClinicalTrial)]}


{rawquerystring={!boost 
b=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1)}{!edismax}(("batman" AND 
"indiana jones") OR (NOT ("cancer"))), querystring={!boost 
b=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1)}{!edismax}(("batman" AND 
"indiana jones") OR (NOT ("cancer"))), 
parsedquery=BoostedQuery(boost(+(+((+((_text_ws:batman)^2.0 | 
(_text_txt:batman)^0.5 | (_text_txt_en_split:batman)^0.1) 
+((_text_ws:"indiana jones")^2.0 | (_text_txt:"indiana jones")^0.5 | 
(_text_txt_en_split:"indiana jone")^0.1)) (-(+((_text_ws:cancer)^2.0 | 
(_text_txt:cancer)^0.5 | (_text_txt_en_split:cancer)^0.1)) 
+*:*)^1.0)),1.0/(3.16E-11*float(ms(const(150691680),date(issuedate_tdt)))+1.0))), 
parsedquery_toString=boost(+(+((+((_text_ws:batman)^2.0 | 
(_text_txt:batman)^0.5 | (_text_txt_en_split:batman)^0.1) 
+((_text_ws:"indiana jones")^2.0 | (_text_txt:"indiana jones")^0.5 | 
(_text_txt_en_split:"indiana jone")^0.1)) (-(+((_text_ws:cancer)^2.0 | 
(_text_txt:cancer)^0.5 | (_text_txt_en_split:cancer)^0.1)) 
+*:*)^1.0)),1.0/(3.16E-11*float(ms(const(150691680),date(issuedate_tdt)))+1.0)), 
QParser=ExtendedDismaxQParser, altquerystring=null, boost_queries=null, 
parsed_boost_queries=[], boostfuncs=null, 
boost_str=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1), 
boost_parsed=org.apache.lucene.queries.function.valuesource.ReciprocalFloatFunction:1.0/(3.16E-11*float(ms(const(150691680),date(issuedate_tdt)))+1.0), 
filter_queries=[issuedate_tdt:[2000\-09\-18T04\:00\:00Z/DAY TO 
2017\-10\-02T04\:00\:00Z/DAY+1DAY}, types_ss:(TrademarkApp OR 
Stockmarket OR AllClinicalTrials OR PressRelease OR Patent OR SEC OR 

Is there a way to determine fields available for faceting for a search without doing the faceting?

2017-08-10 Thread Michael Joyner

Hey all!

Is there a way to determine fields available for faceting (those with 
data) for a search without actually doing the faceting for the fields?


-Mike/NewsRx



phrase highlight, exact phrases only?

2017-07-25 Thread Michael Joyner

Hello,


We are using highlighting and are looking for the exact phrase "HIV 
Prevention" but are receiving back highlighted snippets like the 
following where non-phrase matching portions are being highlighted, is 
there a setting to highlight the entire phrase instead of any partial 
token matches?


==> Settings:

protected static final String QUERY_TYPE = "edismax";

 String highlight_query = "\"HIV Prevention\"";

query.addHighlightField("_text_ws");
query.addHighlightField("_text_txt");
query.addHighlightField("_text_txt_en_split");
query.set("hl.qparser", QUERY_TYPE);
query.set("hl.q", highlight_query);
query.setHighlight(true);
query.setHighlightSimplePre("");
query.setHighlightSimplePost("");
query.setHighlightSnippets(999);
query.setHighlightFragsize(100);
query.set("hl.usePhraseHighlighter", "true");
query.set("hl.highlightMultiTerm", "true");

==> Snippets:

SNIPPET:  impact on male viewers.” For more information on this research 
see: Emotional Appeals in HIV class="highlight">Prevention
SNIPPET:  Findings from University of Southern California Reveals New 
Findings on HIV/AIDS (Emotional Appeals in class="highlight">HIV
SNIPPET:  Prevention Campaigns: 
Unintended Stigma Effects) Immune System Diseases and Conditions - 
HIV/AIDS California
SNIPPET:  on male viewers.” For more information on this research see: 
Emotional Appeals in HIV class="highlight">Prevention Campaigns
SNIPPET:  Findings from University of Southern California Reveals New 
Findings on HIV/AIDS (Emotional Appeals in class="highlight">HIV
SNIPPET:  Prevention Campaigns: 
Unintended Stigma Effects) Immune System Diseases and Conditions - 
HIV/AIDS California




Re: mm = 1 and multi-field searches (update)

2017-07-24 Thread Michael Joyner

We are using qf= as in:

QF: plain_abstract_en^0.1 plain_abstract_text_general^0.5 
plain_abstract_text_ws^2 plain_subhead_text_ws^2 
plain_subhead_text_general^0.5 plain_subhead_en^0.1 
plain_title_text_ws^2 plain_title_text_general^0.5 plain_title_en^0.1 
keywords_text_ws^2 keywords_text_general^0.5 keywords_en^0.1


On 07/21/2017 01:46 PM, Susheel Kumar wrote:

Interesting. If its working for you then its good but to your original
question, qf seems to be working.

Adding to mailing list for the benefit of others.

On Fri, Jul 21, 2017 at 9:41 AM, Michael Joyner <mich...@newsrx.com> wrote:


Thanks,

We finally figured out that setting mm=100% doesn't seem to provide the
desired results across multiple fields.

We switched to using q.op=AND and it seems to work as desired at first
glance.

We discovered additionally that when mm=100% and we try using an explicit
OR operator in the queries that the OR operator seems to get ignored and
that we need to set mm=0 and set q.op=AND for the OR operator to work.

-Mike/NewsRx

On 07/10/2017 05:50 PM, Susheel Kumar wrote:

How are you specifying multiple fields. Use qf parameter to specify
multiple fields e.g.
http://localhost:8983/solr/techproducts/select?indent=on=Samsung%20Maxtor%20hard=json=edismax=name%20manu=on=1


On Mon, Jul 10, 2017 at 4:51 PM, Michael Joyner <mich...@newsrx.com> 
<mich...@newsrx.com> wrote:


Hello all,

How does setting mm = 1 for edismax impact multi-field searches?

We set mm to 1 and get zero results back when specifying multiple fields
to search across.

Is there a way to set mm = 1 for each field, but to OR the individual
field searches together?

-Mike/NewsRx









mm = 1 and multi-field searches

2017-07-10 Thread Michael Joyner

Hello all,

How does setting mm = 1 for edismax impact multi-field searches?

We set mm to 1 and get zero results back when specifying multiple fields 
to search across.


Is there a way to set mm = 1 for each field, but to OR the individual 
field searches together?


-Mike/NewsRx



The unified highlighter html escaping. Seems rather extreme...

2017-05-26 Thread Michael Joyner

Isn't the unified html escaper a rather bit extreme in it's escaping?

It makes it hard to deal with for simple post-processing.

The original html escaper seems to do minimial escaping, not every 
non-alphabetical character it can find.


Also, is there a way to control how much text is returned as context 
around the highlighted frag?


Compare:


Unified Snippet: 

Re: Solr Cloud 6.5.0 Replicas go down while indexing

2017-04-04 Thread Michael Joyner
Try Increasing the number of connections your ZooKeeper allows to a very 
large number.



On 04/04/2017 09:02 AM, Salih Sen wrote:

Hi,

One of the replicas went down again today somehow disabling all 
updates to cluster with error message "Cannot talk to ZooKeeper - 
Updates are disabled.” half an hour.


ZK Leader was on the same server with Solr instance so I doubt it has 
anything to do with network (at least between Solr and ZK leader 
node), restarting the ZK leader seems to resolve the issue and cluster 
accepting updates again.



== Solr Node
WARN  - 2017-04-04 11:49:14.414; [   ] 
org.apache.solr.common.cloud.ConnectionManager; Watcher 
org.apache.solr.common.cloud.ConnectionManager@44ca0f2f name: 
ZooKeeperConnection Watcher:192.168.30.32:2181 
,192.168.30.33:2181 
,192.168.30.24:2181 
 got event WatchedEvent state:Disconnected 
type:None path:null path: null type: None
WARN  - 2017-04-04 11:49:15.723; [   ] 
org.apache.solr.common.cloud.ConnectionManager; zkClient has disconnected
WARN  - 2017-04-04 11:49:15.727; [   ] 
org.apache.solr.common.cloud.ConnectionManager; Watcher 
org.apache.solr.common.cloud.ConnectionManager@44ca0f2f name: 
ZooKeeperConnection Watcher:192.168.30.32:2181 
,192.168.30.33:2181 
,192.168.30.24:2181 
 got event WatchedEvent state:Expired 
type:None path:null path: null type: None
WARN  - 2017-04-04 11:49:15.727; [   ] 
org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper 
session was expired. Attempting to reconnect to recover relationship 
with ZooKeeper...
WARN  - 2017-04-04 11:49:15.728; [   ] 
org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection 
expired - starting a new one...
ERROR - 2017-04-04 11:49:22.040; [c:doc s:shard6 r:core_node27 
x:doc_shard6_replica1] org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
Updates are disabled.
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1739)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:703)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 

Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-02 Thread Michael Joyner
You can solve the disk space and time issues by specifying multiple 
segments to optimize down to instead of a single segment.


When we reindex we have to optimize or we end up with hundreds of 
segments and very horrible performance.


We optimize down to like 16 segments or so and it doesn't do the 3x disk 
space thing and usually runs in a decent amount of time. (we have >50 
million articles in one of our solr indexes).



On 03/02/2017 10:20 AM, David Hastings wrote:

Agreed, and since it takes three times the space is part of the reason it
takes so long, so that 190gb index ends up writing another 380 gb until it
compresses down and deletes the two left over files.  its a pretty hefty
operation

On Thu, Mar 2, 2017 at 10:13 AM, Alexandre Rafalovitch 
wrote:


Optimize operation is no longer recommended for Solr, as the
background merges got a lot smarter.

It is an extremely expensive operation that can require up to 3-times
amount of disk during the processing.

This is not to say yours is a valid question, which I am leaving to
others to respond.

Regards,
Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 2 March 2017 at 10:04, Caruana, Matthew  wrote:

I’m currently performing an optimise operation on a ~190GB index with

about 4 million documents. The process has been running for hours.

This is surprising, because the machine is an EC2 r4.xlarge with four

cores and 30GB of RAM, 24GB of which is allocated to the JVM.

The load average has been steady at about 1.3. Memory usage is 25% or

less the whole time. iostat reports ~6% util.

What gives?

Running Solr 6.4.1.




Huh? What does this even mean? Not enough time left to update replicas. However, the schema is updated already.

2017-02-09 Thread Michael Joyner


Huh? What does this even mean? If the schema is updated already how can 
we be out of time to update it?


Not enough time left to update replicas. However, the schema is updated 
already.




File system choices?

2016-12-15 Thread Michael Joyner (NewsRx)

Hello all,

Can the Solr indexes be safely stored and used via mounted NFS shares?

-Mike



Re: How to check optimized or disk free status via solrj for a particular collection?

2016-12-12 Thread Michael Joyner
We are having an issue with running out of space when trying to do a 
full re-index.


We are indexing with autocommit at 30 minutes.

We have it set to only optimize at the end of an indexing cycle.


On 12/12/2016 02:43 PM, Erick Erickson wrote:

First off, optimize is actually rarely necessary. I wouldn't bother
unless you have measurements to prove that it's desirable.

I would _certainly_ not call optimize every 10M docs. If you must call
it at all call it exactly once when indexing is complete. But see
above.

As far as the commit, I'd just set the autocommit settings in
solrconfig.xml to something "reasonable" and forget it. I usually use
time rather than doc count as it's a little more predictable. I often
use 60 seconds, but it can be longer. The longer it is, the bigger
your tlog will grow and if Solr shuts down forcefully the longer
replaying may take. Here's the whole writeup on this topic:

https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Running out of space during indexing with about 30% utilization is
very odd. My guess is that you're trying to take too much control.
Having multiple optimizations going on at once would be a very good
way to run out of disk space.

And I'm assuming one replica's index per disk or you're reporting
aggregate index size per disk when you sah 30%. Having three replicas
on the same disk each consuming 30% is A Bad Thing.

Best,
Erick

On Mon, Dec 12, 2016 at 8:36 AM, Michael Joyner <mich...@newsrx.com> wrote:

Halp!

I need to reindex over 43 millions documents, when optimized the collection
is currently < 30% of disk space, we tried it over this weekend and it ran
out of space during the reindexing.

I'm thinking for the best solution for what we are trying to do is to call
commit/optimize every 10,000,000 documents or so and then wait for the
optimize to complete.

How to check optimized status via solrj for a particular collection?

Also, is there is a way to check free space per shard by collection?

-Mike





How to check optimized or disk free status via solrj for a particular collection?

2016-12-12 Thread Michael Joyner

Halp!

I need to reindex over 43 millions documents, when optimized the 
collection is currently < 30% of disk space, we tried it over this 
weekend and it ran out of space during the reindexing.


I'm thinking for the best solution for what we are trying to do is to 
call commit/optimize every 10,000,000 documents or so and then wait for 
the optimize to complete.


How to check optimized status via solrj for a particular collection?

Also, is there is a way to check free space per shard by collection?

-Mike



Re: Failure when trying to full sync, out of space ? Doesn't delete old segments before full sync?

2016-11-28 Thread Michael Joyner



On 11/28/2016 12:26 PM, Erick Erickson wrote:

Well, such checks could be put in, but they don't get past the basic problem.



And all this masks your real problem; you didn't have enough disk
space to optimize in the first place. Even during regular indexing w/o
optimizing, Lucene segment merging can always theoretically merge all
your segments at once. Therefore you always need at _least_ as much
free space on your disks as all your indexes occupy to be sure you
won't hit a disk-full problem. The rest would be band-aids. Although I
suppose refusing to even start if there wasn't enough free disk space
isn't a bad idea, it's not foolproof though


If such a "warning" feature is added, it's not "foolproof" that I would 
expect, and I wouldn't expect it be able to "predict" usage caused by 
events happening after it passes a basic initial check. I am just 
thinking a basic up-front check that indicates "it just ain't happening" 
might be useful.


So.. how does one handle needing all this "free space" between major 
index updates when one gets charged by the GB for allocated space 
without regard to actual storage usage?






Re: Failure when trying to full sync, out of space ? Doesn't delete old segments before full sync?

2016-11-28 Thread Michael Joyner
We've being trying to run at 40% estimated usage when optimized, but are 
doing a large amount of index updates ... 40% usage in this scenario 
seems to be too high...



On 11/28/2016 12:26 PM, Erick Erickson wrote:

Well, such checks could be put in, but they don't get past the basic problem.

bq: If the segments are out of date and we are pulling from another
node before coming "online" why aren't the old segments deleted?

because you run the risk of losing _all_ your data and having nothing
at all. The process is
1> pull all the segments down
2> rewrite the segments file

Until <2>, you can still use your old index. Also consider a full
synch in master/slave mode. I optimize on the master and Solr then
detects that it'll be a full sync anddeletes the entire active
index.

bq: Is this something that can be enabled in the master solrconfig.xml file?
no

bq: ...is there a reason a basic disk space check isn't done 
That would not be very robust. Consider the index is 1G and I have
1.5G of free space. Now replication makes the check and starts.
However, during that time segments are merged consuming .75G. Boom,
disk full again.

Additionally, any checks would be per core. What if 10 cores start
replication as above at once? Which would absolutely happen if you
have 10 replicas for the same shard in one JVM...

And all this masks your real problem; you didn't have enough disk
space to optimize in the first place. Even during regular indexing w/o
optimizing, Lucene segment merging can always theoretically merge all
your segments at once. Therefore you always need at _least_ as much
free space on your disks as all your indexes occupy to be sure you
won't hit a disk-full problem. The rest would be band-aids. Although I
suppose refusing to even start if there wasn't enough free disk space
isn't a bad idea, it's not foolproof though

Best,
Erick


On Mon, Nov 28, 2016 at 8:39 AM, Michael Joyner <mich...@newsrx.com> wrote:

Hello all,

I'm running out of spacing when trying to restart nodes to get a cluster
back up fully operational where a node ran out of space during an optimize.

It appears to be trying to do a full sync from another node, but doesn't
take care to check available space before starting downloads and doesn't
delete the out of date segment files before attempting to do the full sync.

If the segments are out of date and we are pulling from another node before
coming "online" why aren't the old segments deleted? Is this something that
can be enabled in the master solrconfig.xml file?

It seems to know what size the segments are before they are transferred, is
there a reason a basic disk space check isn't done for the target partition
with an immediate abort done if the destination's space looks like it would
go negative before attempting sync? Is this something that can be enabled in
the master solrconfig.xml file? This would be a lot more useful (IMHO) than
waiting for a full sync to complete only to run out of space after several
hundred gigs of data is transferred with automatic cluster recovery failing
as a result.

This happens when doing a 'sudo service solr restart'

(Workaround, shutdown offending node, manually delete segment index folders
and tlog files, start node)

Exception:

WARN  - 2016-11-28 16:15:16.291;
org.apache.solr.handler.IndexFetcher$FileFetcher; Error in fetching file:
_2f6i.cfs (downloaded 2317352960 of 5257809205 bytes)
java.io.IOException: No space left on device
 at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
 at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
 at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
 at sun.nio.ch.IOUtil.write(IOUtil.java:65)
 at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
 at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
 at java.nio.channels.Channels.writeFully(Channels.java:101)
 at java.nio.channels.Channels.access$000(Channels.java:61)
 at java.nio.channels.Channels$1.write(Channels.java:174)
 at
org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:419)
 at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
 at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
 at
org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
 at
org.apache.solr.handler.IndexFetcher$DirectoryFile.write(IndexFetcher.java:1634)
 at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1491)
 at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1429)
 at
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:855)
 at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:434)
 at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
 at
org.apache.solr.handler.ReplicationHandl

Failure when trying to full sync, out of space ? Doesn't delete old segments before full sync?

2016-11-28 Thread Michael Joyner

Hello all,

I'm running out of spacing when trying to restart nodes to get a cluster 
back up fully operational where a node ran out of space during an optimize.


It appears to be trying to do a full sync from another node, but doesn't 
take care to check available space before starting downloads and doesn't 
delete the out of date segment files before attempting to do the full sync.


If the segments are out of date and we are pulling from another node 
before coming "online" why aren't the old segments deleted? Is this 
something that can be enabled in the master solrconfig.xml file?


It seems to know what size the segments are before they are transferred, 
is there a reason a basic disk space check isn't done for the target 
partition with an immediate abort done if the destination's space looks 
like it would go negative before attempting sync? Is this something that 
can be enabled in the master solrconfig.xml file? This would be a lot 
more useful (IMHO) than waiting for a full sync to complete only to run 
out of space after several hundred gigs of data is transferred with 
automatic cluster recovery failing as a result.


This happens when doing a 'sudo service solr restart'

(Workaround, shutdown offending node, manually delete segment index 
folders and tlog files, start node)


Exception:

WARN  - 2016-11-28 16:15:16.291; 
org.apache.solr.handler.IndexFetcher$FileFetcher; Error in fetching 
file: _2f6i.cfs (downloaded 2317352960 of 5257809205 bytes)

java.io.IOException: No space left on device
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
at java.nio.channels.Channels.writeFully(Channels.java:101)
at java.nio.channels.Channels.access$000(Channels.java:61)
at java.nio.channels.Channels$1.write(Channels.java:174)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:419)

at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at 
org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:53)
at 
org.apache.solr.handler.IndexFetcher$DirectoryFile.write(IndexFetcher.java:1634)
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1491)
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1429)
at 
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:855)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:434)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:408)
at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:221)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

-Mike



A good way to extract "facets" from json.facet response via solrj?

2016-11-22 Thread Michael Joyner

Hello all,

It seems I can't find a "getFacets" method for SolrJ when handling a 
query response from a json.facet call.


I see that I can get a top level opaque object via "Object obj = 
response.getResponse().get("facets");"


Is there any code in SolrJ to parse this out as an easy to use navigable 
object?


-Mike



Re: How to get "max(date)" from a facet field? (Solr 6.3)

2016-11-21 Thread Michael Joyner
I think I figured out a very hacky "work around" - does this look like 
it will work consistently?


json.facet={
code_s:{
limit:-1,
type:terms,
field:code_s,
facet:{
issuedate_tdt:{
type:terms,
field:issuedate_tdt,
sort:{idate:desc},
limit:1,
/** convert date to a number for sorting, as the date
is being faceted against itself, it should have
 only one possible facet value */
facet:{
idate:"max(issuedate_tdt)"
}
}
}
}
}

On 11/21/2016 03:42 PM, Michael Joyner wrote:

Help,

(Solr 6.3)

Trying to do a "sub-facet" using the new json faceting API, but can't 
seem to figure out how to get the "max" date in the subfacet?


I've tried a couple of different ways:





Is there a way to increase the web ui timout when running test queries?

2016-11-21 Thread Michael Joyner
Argh! I'm trying to run some test queries using the web ui but it keeps 
aborting the connection at 10 seconds? Is there anyway to easily change 
this?


(We currently have heavy indexing going on and the cache keeps getting 
"un-warmed").





How to get "max(date)" from a facet field? (Solr 6.3)

2016-11-21 Thread Michael Joyner

Help,

(Solr 6.3)

Trying to do a "sub-facet" using the new json faceting API, but can't 
seem to figure out how to get the "max" date in the subfacet?


I've tried a couple of different ways:

== query ==

json.facet={
code_s:{
limit:-1,
type:terms,field:code_s,facet:{
issuedate_tdt:"max(issuedate_tdt)"
}
}
}

== partial response ==

facets":{
"count":1310359,
"code_s":{
  "buckets":[{
  "val":"5W",
  "count":255437,
  "issuedate_tdt":1.4794452E12},
{
  "val":"LS",
  "count":201407,
  "issuedate_tdt":1.479186E12},

-- the date values seem to come back out as longs converted to 
float/double which are then truncated and put into scientific notation? --


== query that barfs with fatal exception ==

json.facet={
code_s:{
limit:-1,
type:terms,
field:code_s,
facet:{
issuedate_tdt:{
type:terms,
field:issuedate_tdt,
sort:{val:desc},
limit:1
}
}
}
}

== barfed response ==

"error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException"],
"msg":"org.apache.solr.client.solrj.SolrServerException: No live 
SolrServers available to handle this 
request:[http://solr-0001:8983/solr/newsletters_shard1_replica0, 
http://solr-0002:8983/solr/newsletters_shard1_replica1, 
http://solr-0003:8983/solr/newsletters_shard2_replica0];,
"trace":"org.apache.solr.common.SolrException: 
org.apache.solr.client.solrj.SolrServerException: No live SolrServers 
available to handle this 
request:[http://solr-0001:8983/solr/newsletters_shard1_replica0, 
http://solr-0002:8983/solr/newsletters_shard1_replica1, 
http://solr-0003:8983/solr/newsletters_shard2_replica0]\n\tat 
.. 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:435)\n\tat 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:387)\n\t... 
9 more\n",

"code":500}

== query with null pointer barfed response ==

json.facet={
code_s:{
limit:-1,
type:terms,
field:code_s,
facet:{
issuedate_tdt:{
type:terms,
field:issuedate_tdt,
sort:{issuedate_tdt:desc},
limit:1
}
}
}
}

== response with NPE ==

"metadata":[
"error-class","org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException",
"root-error-class","org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException"],
"msg":"Error from server at 
http://solr-0003:8983/solr/newsletters_shard2_replica0: 
java.lang.NullPointerException\n",
"trace":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at 
http://solr-0003:8983/solr/newsletters_shard2_replica0: 
java.lang.NullPointerException\n\n\tat 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:593)\n\tat 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262)\n\tat 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)\n\tat 
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)\n\tat 
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:195)\n\tat 
java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat 
java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)\n\tat 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)\n\tat 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)\n\tat 
java.lang.Thread.run(Thread.java:745)\n",

"code":500}

Any thoughts on how I can accomplish getting the max date per code where 
each code has many dates?




Re: Using solr(cloud) as source-of-truth for data (with no backing external db)

2016-11-21 Thread Michael Joyner
Have a "store only" text field that contains a serialized (json?) of the 
master object for deserilization as part of the results parsing if you 
are wanting to save a DB lookup.


I would still store everything in a DB though to have a "master" copy of 
everthing.



On 11/18/2016 04:45 AM, Dorian Hoxha wrote:

@alex
That makes sense, but it can be ~fixed by just storing every field that you
need.

@Walter
Many of those things are missing from many nosql dbs yet they're used as
source of data.
As long as the backup is "point in time", meaning consistent timestamp
across all shards it ~should be ok for many usecases.

The 1-line-curl may need a patch to be disabled from config.

On Thu, Nov 17, 2016 at 6:29 PM, Walter Underwood 
wrote:


I agree, it is a bad idea.

Solr is missing nearly everything you want in a repository, because it is
not designed to be a repository.

Does not have:

* access control
* transactions
* transactional backup
* dump and load
* schema migration
* versioning

And so on.

Also, I’m glad to share a one-line curl command that will delete all the
documents
in your collection.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Nov 17, 2016, at 1:20 AM, Alexandre Rafalovitch 

wrote:

I've heard of people doing it but it is not recommended.

One of the biggest implementation breakthroughs is that - after the
initial learning curve - you will start mapping your input data to
signals. Those signals will not look very much like your original data
and therefore are not terribly suitable to be the source of it.

We are talking copyFields, UpdateRequestProcessor pre-processing,
fields that are not stored, nested documents flattening,
denormalization, etc. Getting back from that to original shape of data
is painful.

Regards,
   Alex.

Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 17 November 2016 at 18:46, Dorian Hoxha 

wrote:

Hi,

Anyone use solr for source-of-data with no `normal` db (of course with
normal backups/replication) ?

Are there any drawbacks ?

Thank You






Issue with empty strings not being indexed/stored?

2016-11-15 Thread Michael Joyner

Hello all,

We've been indexing documents with empty strings for some fields.

After our latest round of Solr/SolrJ updates to 6.3.0 we have discovered 
that fields with empty strings are no longer being stored, effectively 
storing documents with those fields as being NULL/NOT-PRESENT instead of 
EMPTY. (Most definitely not the same thing!)


We are using SolrInputDocuments.

Documents indexed before our latest round of updates have the fields 
with empty strings just fine, new documents indexed since the updates don't.


Example field that is in the input document that isn't showing up as 
populated in the query results:


"mesh_s" : {
"boost" : 1.0,
"firstValue" : "",
"name" : "mesh_s",
"value" : "",
"valueCount" : 1,
"values" : [ "" ]
  }

-Mike




Question about shards, compositeid, and routing

2016-11-02 Thread Michael Joyner (NewsRx)
Ref: 
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud


If an update specifies only the non-routed id, will SolrCloud select the 
correct shard for updating?


If an update specifies a different route, will SolrCloud delete the 
previous document with the same id but with the different routing? (Will 
it effectively change which shard the document is stored on?)


Does the document id have to be unique ignoring the routing prefix? (Is 
the routing prefix considered as part of the id for uniqueness?)



-Mike



Re: Can't load schema managed-schema: unknown field 'id'

2016-07-26 Thread Michael Joyner

Finally got it to straighten out.

So I have two collections, my test collection and my production collection.

I "fat fingered" the test collection and both collections were 
complaining about the missing "id" field.


I downloaded the config from both collections and it was showing the id 
field in place (?)


I restarted the zookeeper I was talking to and then redownloaded the 
configs and now it was gone.


Added it (and _version_) back, re-upped, restarted the solr node local 
to that zookeeper and it stopped complaining about the missing id field.


Now waiting on the node I restarted to show "green".

-MIke


On 07/26/2016 04:32 PM, Alexandre Drouin wrote:

@Michael - there are GUI available for ZooKeeper: 
http://stackoverflow.com/questions/24551835/available-gui-for-zookeeper
I used the Eclipse plugin before and while it is a bit clunky it gets the job 
done.


Alexandre Drouin


-Original Message-
From: John Bickerstaff [mailto:j...@johnbickerstaff.com]
Sent: July 26, 2016 4:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Can't load schema managed-schema: unknown field 'id'
Importance: High

@Michael - somewhere there should be a "conf" directory for your SOLR instance. 
 For my Dev efforts, I moved it to a different directory and I forget where it was, 
originally -- but if you search for solrconfig.xml or schema.xml, you should find it.

It could be on your servers (or on only one of them) or, if someone has done a 
really good job, it's in source control somewhere...

On Tue, Jul 26, 2016 at 2:17 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:


 

and further on in the file...

 id


On Tue, Jul 26, 2016 at 2:17 PM, John Bickerstaff <
j...@johnbickerstaff.com> wrote:


I don't see a managed schema file.  As far as I understand it, id is
set as a "uniqueKey" in the schema.xml file...

On Tue, Jul 26, 2016 at 2:11 PM, Michael Joyner <mich...@newsrx.com>
wrote:


ok, I think I need to do a manual edit on the managed-schema file
but I get "NoNode" for /managed-schema when trying to use the zkcli.sh file?


How can I get to this file and edit it?


On 07/26/2016 03:05 PM, Alexandre Drouin wrote:


Hello,

You may have a uniqueKey that points to a field that do not exists
anymore.  You can try adding an "id" field using Solr's UI or the
schema API since you are using the managed-schema.


Alexandre Drouin

-Original Message-
From: Michael Joyner [mailto:mich...@newsrx.com]
Sent: July 26, 2016 2:34 PM
To: solr-user@lucene.apache.org
Subject: Can't load schema managed-schema: unknown field 'id'

|Help!|

|
|

|What is the best way to recover from: |

Can't load schema managed-schema: unknown field 'id'
|I was managing the schema on a test collection, fat fingered it,
|but
now
I find out the schema ops seem to altering all collections on the core?
SolrCloud 5.5.1 |||

|
-Mike|||







Re: Can't load schema managed-schema: unknown field 'id'

2016-07-26 Thread Michael Joyner

@John

I am using a managed schema with zookeeper/solrcloud.


On 07/26/2016 04:21 PM, John Bickerstaff wrote:

@Michael - somewhere there should be a "conf" directory for your SOLR
instance.  For my Dev efforts, I moved it to a different directory and I
forget where it was, originally -- but if you search for solrconfig.xml or
schema.xml, you should find it.

It could be on your servers (or on only one of them) or, if someone has
done a really good job, it's in source control somewhere...

On Tue, Jul 26, 2016 at 2:17 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:


 

and further on in the file...


id


On Tue, Jul 26, 2016 at 2:17 PM, John Bickerstaff <
j...@johnbickerstaff.com> wrote:


I don't see a managed schema file.  As far as I understand it, id is set
as a "uniqueKey" in the schema.xml file...

On Tue, Jul 26, 2016 at 2:11 PM, Michael Joyner <mich...@newsrx.com>
wrote:


ok, I think I need to do a manual edit on the managed-schema file but I
get "NoNode" for /managed-schema when trying to use the zkcli.sh file?


How can I get to this file and edit it?


On 07/26/2016 03:05 PM, Alexandre Drouin wrote:


Hello,

You may have a uniqueKey that points to a field that do not exists
anymore.  You can try adding an "id" field using Solr's UI or the schema
API since you are using the managed-schema.


Alexandre Drouin

-Original Message-
From: Michael Joyner [mailto:mich...@newsrx.com]
Sent: July 26, 2016 2:34 PM
To: solr-user@lucene.apache.org
Subject: Can't load schema managed-schema: unknown field 'id'

|Help!|

|
|

|What is the best way to recover from: |

Can't load schema managed-schema: unknown field 'id'
|I was managing the schema on a test collection, fat fingered it, but
now
I find out the schema ops seem to altering all collections on the core?
SolrCloud 5.5.1 |||

|
-Mike|||







Re: Can't load schema managed-schema: unknown field 'id'

2016-07-26 Thread Michael Joyner

ok...

I downloaded the config for both of my collections and the downloaded 
managed-schema file shows "id" as defined? But the online view in the UI 
shows it as not defined?


I've tried re-upping the config and nothing changes.

-Mike



On 07/26/2016 04:11 PM, John Bickerstaff wrote:

@Michael - if you're on Linux and decide to take Alexandre's advice, I can
possibly save you some time.  I wrestled with getting the data in and out
of zookeeper a while ago...

sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -confdir
/home/john/conf/ -confname collectionName -z 192.168.56.5/solr5_4

Explanation:

sudo /opt/solr/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig = run
the code that sends config files (whatever files you modify)over to
Zookeeper

-confdir /home/john/conf/ = find the configuration directory here

-confname collectionName  = apply the configuration to this collection name

-z 192.168.56.5/solr5_4 - find Zookeeper here - and use the solr5_4
"chroot" which already exists in Zookeeper  (If you don't have chroot in
Zookeeper, ignore and don't use the slash)





On Tue, Jul 26, 2016 at 1:55 PM, Alexandre Drouin <
alexandre.dro...@orckestra.com> wrote:


Other than deleting the collection, I think you'll have to edit the
manage-schema file manually.

Since you are using SolrCloud you will need to use Solr's zkcli (
https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities)
utility to download and upload the file from ZooKeeper.


Alexandre Drouin


-Original Message-
From: Michael Joyner [mailto:mich...@newsrx.com]
Sent: July 26, 2016 3:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Can't load schema managed-schema: unknown field 'id'
Importance: High

Same error via the UI:

Can't load schema managed-schema: unknown field 'id'


On 07/26/2016 03:05 PM, Alexandre Drouin wrote:

Hello,

You may have a uniqueKey that points to a field that do not exists

anymore.  You can try adding an "id" field using Solr's UI or the schema
API since you are using the managed-schema.


Alexandre Drouin

-Original Message-
From: Michael Joyner [mailto:mich...@newsrx.com]
Sent: July 26, 2016 2:34 PM
To: solr-user@lucene.apache.org
Subject: Can't load schema managed-schema: unknown field 'id'

|Help!|

|
|

|What is the best way to recover from: |

Can't load schema managed-schema: unknown field 'id'
|I was managing the schema on a test collection, fat fingered it, but now
I find out the schema ops seem to altering all collections on the core?
SolrCloud 5.5.1 |||

|
-Mike|||






Re: Can't load schema managed-schema: unknown field 'id'

2016-07-26 Thread Michael Joyner
ok, I think I need to do a manual edit on the managed-schema file but I 
get "NoNode" for /managed-schema when trying to use the zkcli.sh file?



How can I get to this file and edit it?


On 07/26/2016 03:05 PM, Alexandre Drouin wrote:

Hello,

You may have a uniqueKey that points to a field that do not exists anymore.  You can try 
adding an "id" field using Solr's UI or the schema API since you are using the 
managed-schema.


Alexandre Drouin

-Original Message-
From: Michael Joyner [mailto:mich...@newsrx.com]
Sent: July 26, 2016 2:34 PM
To: solr-user@lucene.apache.org
Subject: Can't load schema managed-schema: unknown field 'id'

|Help!|

|
|

|What is the best way to recover from: |

Can't load schema managed-schema: unknown field 'id'
|I was managing the schema on a test collection, fat fingered it, but now
I find out the schema ops seem to altering all collections on the core?
SolrCloud 5.5.1 |||

|
-Mike|||




Re: Can't load schema managed-schema: unknown field 'id'

2016-07-26 Thread Michael Joyner

Same error via the UI:

Can't load schema managed-schema: unknown field 'id'


On 07/26/2016 03:05 PM, Alexandre Drouin wrote:

Hello,

You may have a uniqueKey that points to a field that do not exists anymore.  You can try 
adding an "id" field using Solr's UI or the schema API since you are using the 
managed-schema.


Alexandre Drouin

-Original Message-----
From: Michael Joyner [mailto:mich...@newsrx.com]
Sent: July 26, 2016 2:34 PM
To: solr-user@lucene.apache.org
Subject: Can't load schema managed-schema: unknown field 'id'

|Help!|

|
|

|What is the best way to recover from: |

Can't load schema managed-schema: unknown field 'id'
|I was managing the schema on a test collection, fat fingered it, but now
I find out the schema ops seem to altering all collections on the core?
SolrCloud 5.5.1 |||

|
-Mike|||




Re: Can't load schema managed-schema: unknown field 'id'

2016-07-26 Thread Michael Joyner

The schema API is failing with the the unknown field "id" error.

Where in the UI could I try adding this field back at?


On 07/26/2016 03:05 PM, Alexandre Drouin wrote:

Hello,

You may have a uniqueKey that points to a field that do not exists anymore.  You can try 
adding an "id" field using Solr's UI or the schema API since you are using the 
managed-schema.


Alexandre Drouin

-Original Message-
From: Michael Joyner [mailto:mich...@newsrx.com]
Sent: July 26, 2016 2:34 PM
To: solr-user@lucene.apache.org
Subject: Can't load schema managed-schema: unknown field 'id'

|Help!|

|
|

|What is the best way to recover from: |

Can't load schema managed-schema: unknown field 'id'
|I was managing the schema on a test collection, fat fingered it, but now
I find out the schema ops seem to altering all collections on the core?
SolrCloud 5.5.1 |||

|
-Mike|||




Can't load schema managed-schema: unknown field 'id'

2016-07-26 Thread Michael Joyner

|Help!|

|
|

|What is the best way to recover from: |

Can't load schema managed-schema: unknown field 'id'
|I was managing the schema on a test collection, fat fingered it, but now 
I find out the schema ops seem to altering all collections on the core? 
SolrCloud 5.5.1 |||


|
-Mike|||


Re: Rolling upgrade to 5.4 from 5.0 - "bug" caused by leader changes - is there a workaround?

2016-01-21 Thread Michael Joyner

On 01/21/2016 01:22 PM, Ishan Chattopadhyaya wrote:

Perhaps you could stay on 5.4.1 RC2, since that is what 5.4.1 will be
(unless there are last moment issues).

On Wed, Jan 20, 2016 at 7:50 PM, Michael Joyner <mich...@newsrx.com> wrote:


Unfortunately, it really couldn't wait.

I did a rolling upgrade to the 5.4.1RC2 then downgraded everything to
5.4.0 and so far everything seems fine.

Couldn't take the cluster down.




I can wait for the 5.4.1 to be official as we are on the now official 
5.40 and so far all is well.


Though, I do find it odd that you can add a copy field declaration that 
duplicates an already existing declaration and one ends up with 
duplicate declarations...




Re: Rolling upgrade to 5.4 from 5.0 - "bug" caused by leader changes - is there a workaround?

2016-01-20 Thread Michael Joyner

Unfortunately, it really couldn't wait.

I did a rolling upgrade to the 5.4.1RC2 then downgraded everything to 
5.4.0 and so far everything seems fine.


Couldn't take the cluster down.

On 01/19/2016 05:03 PM, Anshum Gupta wrote:

If you can wait, I'd suggest to be on the bug fix release. It should be out
around the weekend.

On Tue, Jan 19, 2016 at 1:48 PM, Michael Joyner <mich...@newsrx.com> wrote:


ok,

I just found the 5.4.1 RC2 download, it seems to work ok for a rolling
upgrade.

I will see about downgrading back to 5.4.0 afterwards to be on an official
release ...



On 01/19/2016 04:27 PM, Michael Joyner wrote:


Hello all,

I downloaded 5.4 and started doing a rolling upgrade from a 5.0 solrcloud
cluster and discovered that there seems to be a compatibility issue where
doing a rolling upgrade from pre-5.4 which causes the 5.4 to fail with
unable to determine leader errors.

Is there a work around that does not require taking the cluster down to
upgrade to 5.4? Should I just stay with 5.3 for now? I need to implement
programmatic schema changes in our collection via solrj, and based on what
I'm reading this is a very new feature and requires the latest (or near
latest) solrcloud.

Thanks!

-Mike









Re: Rolling upgrade to 5.4 from 5.0 - "bug" caused by leader changes - is there a workaround?

2016-01-19 Thread Michael Joyner

ok,

I just found the 5.4.1 RC2 download, it seems to work ok for a rolling 
upgrade.


I will see about downgrading back to 5.4.0 afterwards to be on an 
official release ...



On 01/19/2016 04:27 PM, Michael Joyner wrote:

Hello all,

I downloaded 5.4 and started doing a rolling upgrade from a 5.0 
solrcloud cluster and discovered that there seems to be a 
compatibility issue where doing a rolling upgrade from pre-5.4 which 
causes the 5.4 to fail with unable to determine leader errors.


Is there a work around that does not require taking the cluster down 
to upgrade to 5.4? Should I just stay with 5.3 for now? I need to 
implement programmatic schema changes in our collection via solrj, and 
based on what I'm reading this is a very new feature and requires the 
latest (or near latest) solrcloud.


Thanks!

-Mike




Rolling upgrade to 5.4 from 5.0 - "bug" caused by leader changes - is there a workaround?

2016-01-19 Thread Michael Joyner

Hello all,

I downloaded 5.4 and started doing a rolling upgrade from a 5.0 
solrcloud cluster and discovered that there seems to be a compatibility 
issue where doing a rolling upgrade from pre-5.4 which causes the 5.4 to 
fail with unable to determine leader errors.


Is there a work around that does not require taking the cluster down to 
upgrade to 5.4? Should I just stay with 5.3 for now? I need to implement 
programmatic schema changes in our collection via solrj, and based on 
what I'm reading this is a very new feature and requires the latest (or 
near latest) solrcloud.


Thanks!

-Mike


Re: eDisMax parser and special characters

2014-10-08 Thread Michael Joyner

Try escaping special chars with a \

On 10/08/2014 01:39 AM, Lanke,Aniruddha wrote:

We are using a eDisMax parser in our configuration. When we search using the 
query term that has a ‘-‘ we don’t get any results back.

Search term: red - yellow
This doesn’t return any data back but






I get zero results when combining query.set(fq,{!collapse field=title_s}); and query.set(group, true); ???

2014-10-01 Thread Michael Joyner

I have a SolrCloud setup with two shards.

When I use query.set(fq,{!collapse field=title_s}); the results 
show duplicates because of the sharding.


EX:

{status=0,QTime=1141,params={fl=id,code_s,issuedate_tdt,pageno_i,subhead_s,title_s,type_s,citation_articleTitle_s,citation_articlePageNo_i,citation_corp_s,citation_publicationCode_s,citation_issn_s,citation_articleId_i,citation_scPublicationCode_i,citation_publicationTitle_s,citation_articleIssueDate_dt,score,df=[plain_abstract_en, 
plain_title_en, 
plain_subhead_en],debugQuery=false,uf=-*,start=0,q={!boost 
b=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1)}dog 
cancer,bf=[plain_title_en, plain_subhead_en],wt=javabin,fq={!collapse 
field=title_s},version=2,defType=edismax,rows=5}}


{articleid:573891,code:LB,formattedCitation:(2004-07-04), 
Plasmid-based hormone therapy could improve disease or age-induced 
wasting, iLab Business Week/i, 14, ISSN: 
1552-647X,issuedate:108891360,pageno:14,score:0.04496974,subhead:ADViSYS 
Inc.,title:Plasmid-based hormone therapy could improve disease or 
age-induced wasting,type:PressRelease,weight:0}


{articleid:574262,code:NH,formattedCitation:(2004-07-04), 
Plasmid-based hormone therapy could improve disease or age-induced 
wasting, iNursing Home amp; Elder Business Week/i, 2, ISSN: 
1552-2571,issuedate:108891360,pageno:2,score:0.044759396,subhead:ADViSYS 
Inc.,title:Plasmid-based hormone therapy could improve disease or 
age-induced wasting,type:PressRelease,weight:0}


FACET COUNTS: subhead_s: ADViSYS Inc. - 2

If I instead use:

query.set(group, true); query.set(group.field, title_s); 
query.set(group.main, true); query.set(group.truncate, true); 
query.set(group.facet, true);


I receive back:

{status=0,QTime=72,params={uf=-*,group.main=true,wt=javabin,group.facet=true,version=2,rows=5,defType=edismax,fl=id,code_s,issuedate_tdt,pageno_i,subhead_s,title_s,type_s,citation_articleTitle_s,citation_articlePageNo_i,citation_corp_s,citation_publicationCode_s,citation_issn_s,citation_articleId_i,citation_scPublicationCode_i,citation_publicationTitle_s,citation_articleIssueDate_dt,score,debugQuery=false,df=[plain_abstract_en, 
plain_title_en, plain_subhead_en],start=0,q={!boost 
b=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1)}dog 
cancer,group.truncate=true,bf=[plain_title_en, 
plain_subhead_en],group.field=title_s,group=true}}


{articleid:573891,code:LB,formattedCitation:(2004-07-04), 
Plasmid-based hormone therapy could improve disease or age-induced 
wasting, iLab Business Week/i, 14, ISSN: 
1552-647X,issuedate:108891360,pageno:14,score:0.04494973,subhead:ADViSYS 
Inc.,title:Plasmid-based hormone therapy could improve disease or 
age-induced wasting,type:PressRelease,weight:0}


FACET COUNTS: subhead_s: ADViSYS Inc. - 2

??? If I combine the two together I get no results back ??? I was trying 
to combine the two together because I will be searching 45+ million 
records with duplication based on title_s by an approximate factor of 10.


{status=0,QTime=1103,params={facet=true,facet.mincount=1,uf=-*,facet.limit=10,group.main=true,wt=javabin,group.facet=true,version=2,rows=5,defType=edismax,fl=id,code_s,issuedate_tdt,pageno_i,subhead_s,title_s,type_s,citation_articleTitle_s,citation_articlePageNo_i,citation_corp_s,citation_publicationCode_s,citation_issn_s,citation_articleId_i,citation_scPublicationCode_i,citation_publicationTitle_s,citation_articleIssueDate_dt,score,debugQuery=false,df=[plain_abstract_en, 
plain_title_en, plain_subhead_en],start=0,q={!boost 
b=recip(ms(NOW/DAY,issuedate_tdt),3.16e-11,1,1)}dog 
cancer,group.truncate=true,bf=[plain_title_en, 
plain_subhead_en],group.field=title_s,facet.field=subhead_s,group=true,fq={!collapse 
field=title_s}}}




Re: Access solr cloud via ssh tunnel? (Workaround/Jsch)

2014-09-18 Thread Michael Joyner

On 09/16/2014 04:03 PM, Doug Balog wrote:

Not sure if this will work, but try to use ssh to setup a SOCKS proxy via
the  -D  command option.
Then use the socksProxyHost and socksProxyPort via the java command line
(ie java -DsocksProxyHost=localhost)  or
System.setProperty(socksProxyHost,localhost) from your code. Make sure
to specify both the host and the port.
See
http://docs.oracle.com/javase/7/docs/api/java/net/doc-files/net-properties.html


Unfortunately Jsch does not seem to provide the -D  socks5 over 
ssh option.


- In case this may help others -

Because the production system will have direct access to the cluster and 
this is being setup for accessing the production cloud from our office 
we instead did the following:


SolrTunnels t = new SolrTunnels();
t.connect();
LBHttpSolrServer server = new LBHttpSolrServer();
server.setParser(new BinaryResponseParser());
server.setAliveCheckInterval(500);
for (SolrHost solr: t.getEndpoints()) {
server.addSolrServer(http://127.0.0.1:+solr.forward+/solr/test;);
}

WHERE:

import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;

import com.jcraft.jsch.JSch;
import com.jcraft.jsch.JSchException;
import com.jcraft.jsch.Session;
import com.newsrx.util.NrxLog;

public class SolrTunnels {

static final private String sshUser = autossh;
static final private String sshPass = LETMEIN;
static private String sshHost = public.solr.gateway.host.com;
static final private int sshPort = 22;

static volatile private JSch jsch = new JSch();
static private Session solrSSH = null;

public static class SolrHost {
public String host;
public int port;
public int forward;
public SolrHost() {}
public SolrHost(String host, int port) {
super();
this.host = host;
this.port = port;
this.forward = -1;
}
}
final static private ListSolrHost nodes;
static {
nodes=new ArrayList();
nodes.add(new SolrHost(solr1.private, 8983));
nodes.add(new SolrHost(solr2.private, 8983));
}
public SolrTunnels() {

}

public void connect() {
if (solrSSH != null) {
if (solrSSH.isConnected()) {
return;
}
}

JSch.setConfig(StrictHostKeyChecking, no);
JSch.setConfig(Compression, none);//compressionsometimes 
causes ssh transport breakage


int maxTries = 100;
do {
try {
if (solrSSH !=null) {
solrSSH.disconnect();
}
solrSSH = jsch.getSession(sshUser, sshHost, sshPort);
solrSSH.setPassword(sshPass);
solrSSH.connect(1000);
IteratorSolrHost isolr = nodes.iterator();
while (isolr.hasNext()) {
SolrHost solr = isolr.next();
solr.forward=solrSSH.setPortForwardingL(0, solr.host,solr.port);
Console.log(http://127.0.0.1:+solr.forward+/solr;);
}
} catch (JSchException e) {
e.printStackTrace();
try {
   Console.log(Sleeping 100 ms);
Thread.sleep(100);
} catch (InterruptedException e1) {
}
}
} while (maxTries--  0!solrSSH.isConnected());
}

public CollectionSolrHost getEndpoints() {
ListSolrHost list = new ArrayList();
IteratorSolrHost isolr = nodes.iterator();
while (isolr.hasNext()) {
SolrHost solr = isolr.next();
if (solr.forward0) {
list.add(solr);
}
}
return list;
}

public void disconnect() {
if (solrSSH !=null) {
IteratorSolrHost isolr = nodes.iterator();
while (isolr.hasNext()) {
SolrHost solr = isolr.next();
try {
solrSSH.delPortForwardingL(solr.forward);
} catch (JSchException e) {
}
}
solrSSH.disconnect();
}
}
}



Access solr cloud via ssh tunnel?

2014-09-16 Thread Michael Joyner

I am in a situation where I need to access a solrcloud behind a firewall.

I have a tunnel enabled to one of the zookeeper as a starting points and 
the following test code:


CloudSolrServer server = new CloudSolrServer(localhost:2181);
server.setDefaultCollection(test);
SolrPingResponse p = server.ping();
System.out.println(p.getRequestUrl());

Right now it just hangs without any errors... what additional ports 
need forwarding and other configurations need setting to access a 
solrcloud over a ssh tunnel or tunnels?