Re: Using Properties in /cdcr doesn't seem to work

2017-06-02 Thread Erick Erickson
You haven't really told us what you tried and what the failure was. Is your problem getting the _configuration_ created or using the system variables after they're created? You need to tell us exactly _what_ you tried and exactly _how_ what you tried didn't work. Details matter, particularly

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Erick Erickson
bq: fq value, say 20 char Well, my guess here is that you're constructing a huge OR clause (that's the usual case for such large fq clauses). It's rare for such a clause to be generated identically very often. Do you really expect to have this _exact_ clause created over and over and

Re: Steps for building solr/lucene code and starting server

2017-06-02 Thread Erick Erickson
You can just put a directive in your solrconfig.xml file that points to the jar in analysis-extras. I generally prefer that to copying things around on the theory that it's one less thing to forget to copy sometime later... Best, Erick On Fri, Jun 2, 2017 at 5:05 PM, Nawab Zada Asad Iqbal

Re: Steps for building solr/lucene code and starting server

2017-06-02 Thread Nawab Zada Asad Iqbal
When I do 'ant server', the libs from "./build/lucene-libs/" are copied over to "./server/solr-webapp/webapp/WEB-INF/lib/" . However, my required class is in a lib which is on: "./build/contrib/solr-analysis-extras/lucene-libs/" I guess, I need to do the contrib target? On Fri, Jun 2, 2017 at

Re: Steps for building solr/lucene code and starting server

2017-06-02 Thread Nawab Zada Asad Iqbal
Hi Erick "bin/solr start -e techproducts" works fine. It is probably because it is not referring to 'org.apache.lucene.analysis.ic u.ICUNormalizer2CharFilterFactory' in the schema.xml ? I am not sure what should I try. I am wondering if there is some document about solr dev setup. On Fri, Jun

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Daniel Angelov
In this case, for example: http://host1:8983/solr/collName/admin/mbeans?stats=true returns us stats in the contex of the shard of "collName", living on host1, is not it? BR Daniel Am 02.06.2017 20:00 schrieb "Daniel Angelov" : Sorry for the typos in the previous mail,

Re: Upgrading config from 4.5.0 to 6.5.1

2017-06-02 Thread Tony Wang
Hi Nawab, We did exact the same way like Rick recommended. When you apply your change from your old configs on top of the originals, it will give you the errors for incompatible settings. For an example of "text_general_edge_ngram" fieldType setting, side="front" is no longer valid attributes.

Re: Learn To Rank Questions

2017-06-02 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi, Sorry for the delay, here are my replies: 1. I'm not yet a spark user (but I'm working on that :)) 2. I'm not sure I understand how you would use a feature that is not a float into a model, in my experience all the learning to rank methods always train and predict from a list of floats.

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Daniel Angelov
Sorry for the typos in the previous mail, "fg" should be "fq" Am 02.06.2017 18:15 schrieb "Daniel Angelov" : > This means, that quering alias NNN pointing 3 collections, each 10 shards > and each 2 replicas, a query with very long fg value, say 20 char > string.

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Daniel Angelov
This means, that quering alias NNN pointing 3 collections, each 10 shards and each 2 replicas, a query with very long fg value, say 20 char string. First query with fq will cache all 20 chars 30 times (3 x 10 cores). The next query with the same fg, could not use the same cores as the

Using Properties in /cdcr doesn't seem to work

2017-06-02 Thread Webster Homer
In the documentation for Solr cdcr there is an example of a source configuration that uses properties: ${TargetZk} ${SourceCollection} ${TargetCollection}

Re: Can solrcloud be running on a read-only filesystem?

2017-06-02 Thread Erick Erickson
Mike: That's one possibility. What I'm really asking for is to be sure that there's a good reason (yours is one). It's just that I've spent too much time in my life trying to get something to work only to discover that it has marginal utility so I like to ask "is this important enough to take

Re: Can solrcloud be running on a read-only filesystem?

2017-06-02 Thread Mike Drob
To throw out one possibility, a read only file systems has no (low?) possibility of corruption. If you have a static index then you shouldn't need to be doing any recovery. Would still need to run ZK with RW filesystem, but mybe Solr could work? On Fri, Jun 2, 2017 at 10:15 AM, Erick Erickson

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Erick Erickson
bq: This means, if we have a collection with 2 replicas, there is a chance, that 2 queries with identical fq values can be served from different replicas of the same shards, this means, that the second query will not use the cached set from the first query, is not it? Yes. In practice autowarming

Re: Configuration of parallel indexing threads

2017-06-02 Thread Erick Erickson
that's pretty much my strategy. I'll add parenthetically that I often see the bottleneck for indexing to be acquiring the data from the system of record in the first place rather than Solr. Assuming you're using SolrJ, an easy test is to comment out the line that sends to Solr. There's usually

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Daniel Angelov
Thanks for the answer! This means, if we have a collection with 2 replicas, there is a chance, that 2 queries with identical fq values can be served from different replicas of the same shards, this means, that the second query will not use the cached set from the first query, is not it? Thanks

Re: Steps for building solr/lucene code and starting server

2017-06-02 Thread Erick Erickson
"ant server" should be sufficient. "dist" is useful for when you have custom _external_ programs (say SolrJ) that you want all the libraries collected in the same place. There's no need to "ant compile" as the "server" target I assume what you're seeing is a ClassNotFound error, right? I'm a bit

Re: Number of requests spike up, when i do the delta Import.

2017-06-02 Thread Erick Erickson
A similar pattern should work with .NET, all that's necessary is a JDBC driver for connecting to the database and an connection to a Solr node. SolrNet will not be as performant as SolrJ I'd guess since there's no equivalent to CloudSolrClient. You can still SolrNet, any connection to any Solr

Re: Can solrcloud be running on a read-only filesystem?

2017-06-02 Thread Erick Erickson
As Susheel says, this is iffy, very iffy. You can disable tlogs entirely through solrconfig.xml, you can _probably_ disable all of the Solr logging. You'd also have to _not_ run in SolrCloud. You say "some of the nodes eventually are stuck in the recovering phase" SolrCloud tries very hard to

Re: _version_ / Versioning using timespan

2017-06-02 Thread Susheel Kumar
I see. You can create a JIRA and submit patch and see if committers agree or have different opinion/suggestion. Thanks, Susheel On Fri, Jun 2, 2017 at 10:01 AM, Sergio García Maroto wrote: > You are right about that but in some cases I may need to reindex my data > and

Re: why MULTILINESTRING can contains polygon in solr spatial search

2017-06-02 Thread David Smiley
Hi, Solr 4.7 is old but is probably okay. Is it easy to try a 6.x version? (note Spatial4j java package names have changed). There's also multiple new pertinent options to your scenario:

Re: Enable Gzip compression Solr 6.0

2017-06-02 Thread nilaksh
Hi Rick, I am not sure that Solr can take that stand once it stopped producing a standalone war (Rationale for which is rather well documented here: https://wiki.apache.org/solr/WhyNoWar) If Solr is asking users not to use standalone containers and wants to be used as a Server then it must

Re: Performance Issue in Streaming Expressions

2017-06-02 Thread Joel Bernstein
Once you've scaled up the export from collection4 you can test the performance of the join by moving the NullStream around the join. parallel(null(innerJoin(collection 3, collection4))) Again you'll want to test with different numbers of workers and replicas to see where you max out performance

Re: Performance Issue in Streaming Expressions

2017-06-02 Thread Joel Bernstein
innerJoin(intersect(innerJoin(collection1, collection2), innerJoin(collection 3, collection4)), collection5) Let's focus on: innerJoin(collection 3, collection4)) The first thing to focus on is how fast is the export from collection4. You can test

Re: _version_ / Versioning using timespan

2017-06-02 Thread Sergio García Maroto
You are right about that but in some cases I may need to reindex my data and wanted to avoid deleting the full index so I can still server queries. I thought reindexing same version would be handy or at least to have the flexibility. On 2 June 2017 at 14:53, Susheel Kumar

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Susheel Kumar
Thanks for the correction Shawn. Yes its only the heap allocation settings are per host/JVM. On Fri, Jun 2, 2017 at 9:23 AM, Shawn Heisey wrote: > On 6/1/2017 11:40 PM, Daniel Angelov wrote: > > Is the filter cache separate for each host and then for each > > collection

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Shawn Heisey
On 6/1/2017 11:40 PM, Daniel Angelov wrote: > Is the filter cache separate for each host and then for each > collection and then for each shard and then for each replica in > SolrCloud? For example, on host1 we have, coll1 shard1 replica1 and > coll2 shard1 replica1, on host2 we have, coll1 shard2

Re: Spread SolrCloud across two locations

2017-06-02 Thread Shawn Heisey
On 5/29/2017 8:57 AM, Jan Høydahl wrote: > And if you start all three in DC1, you have 3+3 voting, what would > then happen? Any chance of state corruption? > > I believe that my solution isolates manual change to two ZK nodes in > DC2, while your requires config change to 1 in DC2 and manual >

Re: _version_ / Versioning using timespan

2017-06-02 Thread Susheel Kumar
I see the difference now between using _version_ vs custom versionField. Both seems to behave differently. The _version_ field if used allows same version to be updated and that's the perception I had in mind for custom versionField. My question is why do you want to update the document if same

Re: Is the filter cache separate for each host and then for each collection and then for each shard and then for each replica in SolrCloud?

2017-06-02 Thread Susheel Kumar
The heap allocation and cache settings are per host/JVM not for each collection / shards. In SolrCloud you execute queries against a collection and every other collection may have different schema/document id's and all. So answer to your question, query1 from coll1 can't use results cached from

Re: _version_ / Versioning using timespan

2017-06-02 Thread Susheel Kumar
Just to confirm again before go too far, are you able to execute these examples and see same output given under "Optimistic Concurrency". https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-In-PlaceUpdates Let me know which example you fail to

Re: Can solrcloud be running on a read-only filesystem?

2017-06-02 Thread Susheel Kumar
I doubt it can run in readonly file system. Even though there is no ingestion etc. Solr still needs to write to logs/tlogs for synching / recovering etc Thnx On Fri, Jun 2, 2017 at 6:56 AM, Wudong Liu wrote: > Hi All: > > We have a normal build/stage -> prod

Re: Number of requests spike up, when i do the delta Import.

2017-06-02 Thread Rick Leir
Vrin We had a good speedup from enabling a SQL cache. You also need to avoid updating the DB tables so the cache does not get flushed. Cheers -- Rick On June 2, 2017 4:49:20 AM EDT, vrindavda wrote: >Thanks Erick , > >Could you please suggest some alternative to go with

Can solrcloud be running on a read-only filesystem?

2017-06-02 Thread Wudong Liu
Hi All: We have a normal build/stage -> prod settings for our production pipeline. And we would build solr index in the build environment and then the index is copied to the prod environment. The solrcloud in prod seems working fine when the file system backing it is writable. However, we see

Steps for building solr/lucene code and starting server

2017-06-02 Thread Nawab Zada Asad Iqbal
Hi, I have synced lucene-solr repo because I (will) have some custom code in lucene and solr folders. What are the steps for starting solr server? My schema.xml uses ICUNormalizer2CharFilterFactory (which I see in lucene folder tree), but I don't know how to make it work with solr webapp. I know

Re: Configuration of parallel indexing threads

2017-06-02 Thread gigo314
Thanks for the replies. Just to confirm that I got it right: 1. Since there is no setting to control index writers, is it fair to assume that Solr always indexes at maximum possible speed? 2. The way to control write speed is to control number of clients that are simultaneously posting data,

why MULTILINESTRING can contains polygon in solr spatial search

2017-06-02 Thread kjdong
solr-version:4.7.0 field spec as follows: And i index some MULTILINESTRING (wkt formatted shape, the road data), and i query use "Intersects" spatial predicates like fq=geom:"Intersects(POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))) distErrPct=0". In fact, i want to query the

Re: Solr Web Crawler - Robots.txt

2017-06-02 Thread Charlie Hull
On 02/06/2017 00:56, Doug Turnbull wrote: Scrapy is fantastic and I use it scrape search results pages for clients to take quality snapshots for relevance work +1 for Scrapy; it was built by a team at Mydeco.com while we were building their search backend and has gone from strength to

Re: _version_ / Versioning using timespan

2017-06-02 Thread Sergio García Maroto
I had a look to the source code and I see DocBasedVersionConstraintsProcessorFactory if (0 < ((Comparable)newUserVersion).compareTo((Comparable) oldUserVersion)) { // log.info("VERSION returning true (proceed with update)" ); return true; } I can't find a way of overwriting

Re: Number of requests spike up, when i do the delta Import.

2017-06-02 Thread vrindavda
Thanks Erick , Could you please suggest some alternative to go with SolrNET. @jlman, I tried your way, that do reduces the number of request, but delta-import still take longer than full-import. There is no improvement in performance. -- View this message in context:

Re: _version_ / Versioning using timespan

2017-06-02 Thread Sergio García Maroto
I am using 6.1.0. I tried with two different field types, long and date. I am using this configuration on the solrconfig.xml false UpdatedDateSD i had a look to the wiki page and it says