Re: Production Issue: SOLR node goes to non responsive , restart not helping at peak hours

2019-09-05 Thread Jack Schlederer
My mistake on the link, which should be this: https://lucene.apache.org/solr/guide/7_1/solrcloud-autoscaling-auto-add-replicas.html#implementation-using-autoaddreplicas-trigger --Jack On Thu, Sep 5, 2019 at 11:02 AM Jack Schlederer wrote: > I'd defer to the committers if they have any furt

Re: Production Issue: SOLR node goes to non responsive , restart not helping at peak hours

2019-09-05 Thread Jack Schlederer
to consider when considering your cluster sizing, my org is running only about 50GB of indices, but we run it over 35 nodes with 8GB of heap apiece, each collection with 2+ shards. --Jack On Thu, Sep 5, 2019 at 8:47 AM Doss wrote: > Thanks Eric for the explanation. Sum of all our index s

Different DIH failure behavior on non-sharded and sharded collections

2019-08-26 Thread Jack Schlederer
with a DistributedUpdatesAsyncException when posting that document to another node. I was wondering if this is a known issue with the DIH as of 7.5 and if there's a way to have the DistributedUpdateProcessor sort of "warn and continue" when this type of document is encountered. Thanks in advance! Jack

Restoring and upgrading a standalone index to SolrCloud

2018-10-03 Thread Jack Schlederer
like this to a SolrCloud environment if I can get it into a directory that is shared by all the nodes? Thanks, Jack

Re: ZooKeeper issues with AWS

2018-09-05 Thread Jack Schlederer
Ah, yes. We use ZK 3.4.13 for our ZK server nodes, but we never thought to upgrade the ZK JAR within Solr. We included that in our Solr image, and it's working like a charm, re-resolving DNS names when new ZKs come up with different IPs. Thanks for the help guys! --Jack On Sat, Sep 1, 2018 at 9

Re: ZooKeeper issues with AWS

2018-08-31 Thread Jack Schlederer
(127.0.0.1). Please let us know if you have insight into this issue. Thanks, Jack On Fri, Aug 31, 2018 at 10:40 AM Erick Erickson wrote: > Jack: > > Is it possible to reproduce "manually"? By that I mean without the > chaos bit by the following: > > - Start 3 ZK nodes &g

Re: ZooKeeper issues with AWS

2018-08-30 Thread Jack Schlederer
lists all 3 ZK nodes. We're running ZooKeeper version 3.4.13. Thanks, Jack On Thu, Aug 30, 2018 at 4:12 PM Walter Underwood wrote: > How many Zookeeper nodes in your ensemble? You need five nodes to > handle two failures. > > Are your Solr instances started with a zkHost that li

ZooKeeper issues with AWS

2018-08-30 Thread Jack Schlederer
potential workarounds on this issue since? Thanks, Jack Reference: https://issues.apache.org/jira/browse/SOLR-8868

Re: Strange Alias behavior

2018-01-19 Thread Wenjie Zhang (Jack)
Why would you create an alias with an existing collection name? Sent from my iPhone > On Jan 19, 2018, at 14:14, Webster Homer wrote: > > I just discovered some odd behavior with aliases. > > We are in the process of converting over to use aliases in solrcloud. We >

Custom RequestHandler with the Solr api (solrj) that makes a query call back to the server

2017-05-18 Thread Jack Java
Hi, I'm looking for some advice on specific issue that is holding us back. I'm trying to create a custom RequestHandler with the Solr api (solrj) that makes a query call back to the server. I'm not finding any good, run-able examples on-line. Possibly I'm approaching this wrong. Any advice would

Custom RequestHandler with the Solr api (solrj) that makes a query call back to the server

2017-05-18 Thread Jack Java
Hi,I'm looking for some advice on specific issue that is holding usback. I'mtrying to create a custom RequestHandler with the Solr api (solrj)that makes a query call back to the server. I'mnot finding any good, run-able examples of this on-line. Possibly I'mapproaching this wrong. Any advice

Re: solr.StrField or solr.StringField?

2016-05-03 Thread Jack Krupansky
Yeah, that's a typo. The same typo is in the official Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Putting+the+Pieces+Together [ATTN: Solr Ref Guide Team!] -- Jack Krupansky On Tue, May 3, 2016 at 4:14 PM, John Bickerstaff <j...@johnbickerstaff.com> wrote:

Re: concat 2 fields

2016-04-26 Thread Jack Krupansky
values for the first field. Then you can use Concat to combine the two values. -- Jack Krupansky On Thu, Apr 21, 2016 at 5:29 AM, vrajesh <vrajes...@gmail.com> wrote: > to concatenating two fields to use it as one field from > > http://grokbase.com/t/lucene/solr-user/138vr75h

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread Jack Krupansky
Or should this be higher rated about NY, since it's shorter: * New York Another though on length norms: with the advent of multi-field dismax with per-field boosting, people tend to explicitly boost the title field so that the traditional length normalization is less relevant. -- Jack

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread Jack Krupansky
Or, maybe you have some rule of relevance that you haven't yet shared - and I mean rule that a user would comprehend and consider valuable, not simply a mechanical rule. -- Jack Krupansky On Wed, Apr 20, 2016 at 8:10 PM, <jimi.hulleg...@svensktnaringsliv.se> wrote: > Ok sure, I ca

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread Jack Krupansky
of lengths, but you'll have to decide what you really want to achieve to tune it. Maybe you could give a couple examples of field values that you feel should be scored differently based on length. -- Jack Krupansky On Wed, Apr 20, 2016 at 7:17 PM, <jimi.hulleg...@svensktnaringsliv.se> wrote

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread Jack Krupansky
is in ClassicSimilarity: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/lucene/core/src/java/org/apache/lucene/search/similarities/ClassicSimilarity.java#L115 You can always write your own custom Similarity class to override that calculation. -- Jack Krupansky On Wed, Apr 20

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-18 Thread Jack Krupansky
There is the separate issue of how many application clients you may have and whether a load balancer would be in front of them. -- Jack Krupansky On Mon, Apr 18, 2016 at 4:34 AM, Jaroslaw Rozanski <s...@jarekrozanski.com> wrote: > Hi, > > How are you executing searches? > >

Re: UUID processor handling of empty string

2016-04-16 Thread Jack Krupansky
processor. -- Jack Krupansky On Sat, Apr 16, 2016 at 12:41 PM, Susmit Shukla <shukla.sus...@gmail.com> wrote: > I am seeing the UUID getting generated when I set the field as empty string > like this - solrDoc.addField("id", ""); with solr 5.3.1 and based on the

Re: UUID processor handling of empty string

2016-04-16 Thread Jack Krupansky
"UUID processor factory is generating uuid even if it is empty." The processor will generate the UUID only if the id field is not specified in the input document. Empty value and value not present are not the same thing. So, please clarify your specific situation. -- Jack Krupans

Re: Solr best practices for many to many relations...

2016-04-15 Thread Jack Krupansky
much better than your average RDBMS SQL JOIN. That would be an interesting benchmark. -- Jack Krupansky On Fri, Apr 15, 2016 at 11:06 AM, Joel Bernstein <joels...@gmail.com> wrote: > I think people are going to be surprised though by the speed of the joins. > The joins also get faster a

Re: Can a field be an array of fields?

2016-04-15 Thread Jack Krupansky
into that third field. You could also store the full name in a fourth field as raw JSON if you really need structure in the result. The third field might have first and last name with a special separator such as "|", although a simple comma is typically sufficient. -- Jack Krupansky On Fri, Ap

Re: Solr best practices for many to many relations...

2016-04-15 Thread Jack Krupansky
is the target. But, again, it will depend on the nature of the query, the cardinality of each search field, the cross product of cardinality of search fields, etc. -- Jack Krupansky On Fri, Apr 15, 2016 at 10:44 AM, Joel Bernstein <joels...@gmail.com> wrote: > In general the Streaming Express

Re: Solr best practices for many to many relations...

2016-04-15 Thread Jack Krupansky
document of a denormalized table/index. -- Jack Krupansky On Fri, Apr 15, 2016 at 9:56 AM, Joel Bernstein <joels...@gmail.com> wrote: > Solr now has full distributed join capabilities as part of the Streaming > Expression library. Keep in mind that these are distributed joins so they >

Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-14 Thread Jack Krupansky
BTW, I did check and that stemmer code is the same today as it was in 3.x, so there should be no change in stemmer behavior there. -- Jack Krupansky On Thu, Apr 14, 2016 at 3:47 PM, Sara Woodmansee <swood...@gmail.com> wrote: > Hi Shawn, > > Thanks so much the feedback. And f

Re: Singular Plural Results Inconsistent - SOLR v3.6 and EnglishMinimalStemFilterFactor

2016-04-14 Thread Jack Krupansky
/220433848_How_effective_is_suffixing -- Jack Krupansky On Thu, Apr 14, 2016 at 1:17 PM, Sara Woodmansee <swood...@gmail.com> wrote: > Hello all, > > I posted yesterday, however I never received my own post, so worried it > did not go through (?) Also, I am not a c

Re: Solr best practices for many to many relations...

2016-04-14 Thread Jack Krupansky
exactly is the problem you need to solve. -- Jack Krupansky On Thu, Apr 14, 2016 at 10:12 AM, Bastien Latard - MDPI AG < lat...@mdpi.com.invalid> wrote: > Hi Guys, > > *I am upgrading from solr 4.2 to 6.0.* > *I successfully (after some time) migrated the config files and

Re: Get number of results in filtered query

2016-04-13 Thread Jack Krupansky
If you just do a faceted query without the filter, each facet will give you the number of results for that country and numResults will give you the total number of results across all countries. But once you apply one or more filters, numResults reflects onl the post-filtering documents. -- Jack

Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Jack Krupansky
But no hint anywhere that I know of for how to surface this Lucene feature in Solr. I would suggest the workaround of using an update processor to combine the first and last names into a single multivalues field. -- Jack Krupansky On Wed, Apr 13, 2016 at 4:20 PM, Ahmet Arslan <iori...@yahoo.com.inva

Re: Release date for Solr 6.0

2016-04-07 Thread Jack Krupansky
upgraded from 3 to 4, you may need to force a full optimize on 4 to assure that any lingering old 3 format index segments are in 4 format. -- Jack Krupansky On Thu, Apr 7, 2016 at 12:10 PM, Erick Erickson <erickerick...@gmail.com> wrote: > The release vote just passed, 6.0 should be

Re: maxBooleanClauses in solrconfig.xml is ignored

2016-04-07 Thread Jack Krupansky
Edismax phrase-boost terms? -- Jack Krupansky On Thu, Apr 7, 2016 at 10:28 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 4/7/2016 8:05 AM, Zaccheo Bagnati wrote: > > I'm trying to set the maxBooleanClauses parameter in solrconfig.xml to > 1024 > > but I still have &

Re: Can't get phrase field boosting to work using edismax

2016-04-06 Thread Jack Krupansky
e latter having a higher boost. For example, if the input query exactly matches a product name field, as opposed to simply matching a subset of a longer product name. -- Jack Krupansky On Wed, Apr 6, 2016 at 5:22 AM, <jimi.hulleg...@svensktnaringsliv.se> wrote: > OK, well I'm not sure I a

Re: Can't get phrase field boosting to work using edismax

2016-04-05 Thread Jack Krupansky
://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java I'd say it's a bug, but more a narrow use case that wasn't considered or tested. -- Jack Krupansky On Tue, Apr 5, 2016 at 7:50 AM, <jimi.hulleg...@svensktnaringsliv.se>

Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-04-01 Thread Jack Krupansky
ub.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/solr/example/files/conf/solrconfig.xml strings java.lang.Boolean booleans Hmmm... or maybe the old "booleans" field type should be restored to allow boolean fields to be multivalued? So, somebody should file a Jira on this.

Re: Slor 5.5.0 : SolrException: fieldType 'booleans' not found in the schema

2016-03-31 Thread Jack Krupansky
Exactly which file did you copy? Please give the specific directory. -- Jack Krupansky On Thu, Mar 31, 2016 at 3:24 PM, Girish Tavag <send2mymail...@gmail.com> wrote: > Hi Binoy, > > I copied the entire file schema.xml from the working example provided by > solr itself.

Re: Solr response error 403 when I try to index medium.com articles

2016-03-30 Thread Jack Krupansky
: * Disallow: /_/ Disallow: /m/ Disallow: /me/ Disallow: /@me$ Disallow: /@me/ Disallow: /*/*/edit Sitemap: https://medium.com/sitemap/sitemap.xml -- Jack Krupansky On Wed, Mar 30, 2016 at 1:05 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > 403 means "forbidden&quo

Re: Solr response error 403 when I try to index medium.com articles

2016-03-29 Thread Jack Krupansky
Medium switches from http to https, so you would need the logic for dealing with https security handshakes. -- Jack Krupansky On Tue, Mar 29, 2016 at 7:54 PM, Jeferson dos Anjos < jefersonan...@packdocs.com> wrote: > I'm trying to index some pages of the medium. But I get error 403. I

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Jack Krupansky
it's needed for each collection. Presumably the collections can be upgraded in parallel since they are distinct directories. It would be nice to have a SolrIndexUpgrader to run in one shot and discover and upgrade all Solr collections. -- Jack Krupansky On Thu, Mar 24, 2016 at 12:16 PM, Erick Erickson <

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Jack Krupansky
is on 4.x and upgrades to 5.x with that 4.x data, but will want to upgrade to 6.x shortly, but that will require the 4.x data to be rewritten (force-merged?) to 5.x first. -- Jack Krupansky On Thu, Mar 24, 2016 at 11:38 AM, Bram Van Dam <bram.van...@intix.eu> wrote: > On 23/03/16 15:

Re: Delete by query using JSON?

2016-03-22 Thread Jack Krupansky
See the correct syntax example here: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-SendingJSONUpdateCommands Your query is fine. -- Jack Krupansky On Tue, Mar 22, 2016 at 3:07 PM, Paul Hoffman <p...@flo.org> wrote:

Re: Save Number of words in field

2016-03-21 Thread Jack Krupansky
in a multi-valued text field. That's not as efficient as a custom or script update processor, but avoids creating a custom processor. See: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html Look for "regex-count-words".

Re: Seasonal searches in SOLR 5.x

2016-03-21 Thread Jack Krupansky
was collected. You could decide whether to store this extra info as an alphanumeric code or a sall integers (1-4 for seasons, 1-12 for months.) -- Jack Krupansky On Mon, Mar 21, 2016 at 1:26 PM, Ioannis Kirmitzoglou < ioanniskirmitzog...@gmail.com> wrote: > Hi all, > > I would li

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
, which has an extra AND outside of the inner sub-query, which is a little different than just "(fl:java fl:book)". Sure, the results should be the same, but why insist on the extra level of nested boolean query? -- Jack Krupansky On Thu, Mar 17, 2016 at 12:50 AM, Modassar Ather <modathe

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
committers to address the concern. -- Jack Krupansky On Fri, Mar 18, 2016 at 6:06 AM, Alessandro Benedetti <abenede...@apache.org > wrote: > I think what he tried to explain was : > " Input query : *fl:(java OR book)* > Instead of having the query parser parsing : > *+((fl:ja

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
Now you've confused me... Did you actually intend that q.op=AND was going to perform some function in a query with only two terms and and OR operator? I mean, why not just drop the q.op=AND? -- Jack Krupansky On Wed, Mar 16, 2016 at 1:31 AM, Modassar Ather <modather1...@gmail.com> wrote:

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
You still haven't explained what exactly you are trying to accomplish with that outer level AND/+/MUST. Please be specific - why you insist on "+((fl:java fl:book))" rather than "fl:java fl:book". -- Jack Krupansky On Fri, Mar 18, 2016 at 12:12 AM, Modassar Ather <modath

Re: how to update billions of docs

2016-03-19 Thread Jack Krupansky
documents or a specified query. -- Jack Krupansky On Thu, Mar 17, 2016 at 12:31 PM, Ken Krugler <kkrugler_li...@transpac.com> wrote: > As others noted, currently updating a field means deleting and inserting > the entire document. > > Depending on how you use the field, you might be a

Re: how to update billions of docs

2016-03-19 Thread Jack Krupansky
of documents. Being able to switch indexed mode of a field (or list of fields) is also a mode needed for bulk update (reindex). -- Jack Krupansky On Fri, Mar 18, 2016 at 4:12 AM, Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > Hi Mohsin, > There's some work in progress

Re: Query behavior.

2016-03-15 Thread Jack Krupansky
they will unpaint themselves out of that corner. -- Jack Krupansky On Tue, Mar 15, 2016 at 1:46 AM, Modassar Ather <modather1...@gmail.com> wrote: > Thanks Jack for your response. > The following jira bug for this issue is already present so I have not > created a new one. > https:/

Re: Re: Avoid Duplication of record in searching

2016-03-15 Thread Jack Krupansky
It's called "live indexing" and is in DSE 4.7: http://docs.datastax.com/en/datastax_enterprise/4.7/datastax_enterprise/srch/srchConfIncrIndexThruPut.html -- Jack Krupansky On Tue, Mar 15, 2016 at 4:41 AM, <rajeshkuma...@maxval-ip.com> wrote: > Hi Jack, > I am using D

Re: Avoid Duplication of record in searching

2016-03-14 Thread Jack Krupansky
- are you using that? -- Jack Krupansky On Mon, Mar 14, 2016 at 12:18 PM, <rajeshkuma...@maxval-ip.com> wrote: > HI, > I am having SOLR Search on Cassandra Table, when I do some updation in > the Cassandra Table to which the SOLR is being configured he Updated record > gets

Re: Solr Queries are very slow - Suggestions needed

2016-03-13 Thread Jack Krupansky
Yeah, there's some good material there, but probably still too inaccessible for the average "help, my queries are slow" inquiry we get so frequently on this list. Another useful page is: https://wiki.apache.org/solr/SolrPerformanceProblems -- Jack Krupansky On Sun, Mar 13, 2016

Re: Solr Queries are very slow - Suggestions needed

2016-03-13 Thread Jack Krupansky
(We should have a wiki/doc page for the "usual list of suspects" when queries are/appear slow, rather than need to repeat the same mantra(s) for every inquiry on this topic.) -- Jack Krupansky On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote

Re: Sending text into a number field

2016-03-11 Thread Jack Krupansky
, but... that's not happening.) -- Jack Krupansky On Fri, Mar 11, 2016 at 11:03 AM, Alessandro Benedetti < abenede...@apache.org> wrote: > I agree with Upayavira, > this is an information extraction task, you need to implement your logic to > extract the proper numeric values from the textua

Re: Query behavior.

2016-03-10 Thread Jack Krupansky
how many SHOULD terms are required (Lucene MinShouldMatch.) -- Jack Krupansky On Thu, Mar 10, 2016 at 3:41 AM, Modassar Ather <modather1...@gmail.com> wrote: > Thanks Shawn for pointing to the jira issue. I was not sure that if it is > an expected behavior or a bug or there could hav

Re: ngrams with position

2016-03-10 Thread Jack Krupansky
potentially shuffled. Also, phrase query is an implicit AND while you may really want more of a SpanOr query where the terms are ORed but must be within a close proximity. -- Jack Krupansky On Thu, Mar 10, 2016 at 6:31 AM, Alessandro Benedetti <abenede...@apache.org > wrote: > The r

Re: Indexing Twitter - Hypothetical

2016-03-08 Thread Jack Krupansky
, RAM, and SSD that takes and scale from there, although estimating by more than a factor of ten is problematic given nonlinear effects. -- Jack Krupansky On Tue, Mar 8, 2016 at 11:50 AM, Joseph Obernberger < joseph.obernber...@gmail.com> wrote: > Thank you for the links and explana

Re: Solr Json API How to escape space in search string

2016-03-07 Thread Jack Krupansky
in JSON, with a single backslash. -- Jack Krupansky On Mon, Mar 7, 2016 at 5:49 PM, Iana Bondarska <yana2...@gmail.com> wrote: > Hi All, > could you please tell me if escaping special characters in search keywords > works in json api. > e.g. I have document > { &g

Re: Custom field using PatternCaptureGroupFilterFactory

2016-03-07 Thread Jack Krupansky
Great. And you shouldn't need the "{1}" - the square brackets match a single character by definition. -- Jack Krupansky On Mon, Mar 7, 2016 at 12:20 PM, Jay Potharaju <jspothar...@gmail.com> wrote: > Thanks Jack, the problem was my regex. Following regex worked.

Re: Text search NGram

2016-03-07 Thread Jack Krupansky
. -- Jack Krupansky On Mon, Mar 7, 2016 at 10:33 AM, G, Rajesh <r...@cebglobal.com> wrote: > Hi Jack, > > > > Please correct me if iam wrong I added Char filter because > > > > In Analyzer[solr ui] I have provided "Microsoft office" in Field Value > (

Re: Text search NGram

2016-03-07 Thread Jack Krupansky
The charFilter isn't doing anything useful - the white space tokenzier will ignore extra white space anyway. -- Jack Krupansky On Mon, Mar 7, 2016 at 5:44 AM, G, Rajesh <r...@cebglobal.com> wrote: > Hi Team, > > We have the blow type and we have indexed the value "title&q

Re: Custom field using PatternCaptureGroupFilterFactory

2016-03-06 Thread Jack Krupansky
ene/analysis/pattern/PatternCaptureGroupTokenFilter.html That should probably also say "or if no pattern groups match". To test regular expressions, try an interactive online tool, such as: https://regex101.com/ -- Jack Krupansky On Sun, Mar 6, 2016 at 7:51 PM, Alexandre Rafalovitch &l

Re: Solr Deserialize/Read .fdt file

2016-03-06 Thread Jack Krupansky
iate topic for users on this (Solr) list. -- Jack Krupansky On Sun, Mar 6, 2016 at 3:34 PM, Bin Wang <binwang...@gmail.com> wrote: > Hi there, I am interested in understanding all the files in the index > folder. > > here <http://stackoverflow.com/questions/35830426/solr-read-ind

Re: Indexing Twitter - Hypothetical

2016-03-06 Thread Jack Krupansky
a data model. And without a data model, indexing is a fool's errand. In short, no focus, no progress. -- Jack Krupansky On Sun, Mar 6, 2016 at 7:42 AM, Susheel Kumar <susheel2...@gmail.com> wrote: > Entity Recognition means you may want to recognize different entities > name/p

Re: How to use geospatial search to find the locations within polygon

2016-03-05 Thread Jack Krupansky
arch", and have the spatialContextFactory property (called a "setting" for some reason there although elsewhere in the Solr doc XML attributes are referred to as properties) point to that section. The "old" wiki has some more info, but whether that is sufficient to fully configure

Re: How to use geospatial search to find the locations within polygon

2016-03-04 Thread Jack Krupansky
It would be nice for the doc to say that - describe when IsWithin is and isn't appropriate. And give some examples as well for people to copy/mimic. -- Jack Krupansky On Fri, Mar 4, 2016 at 10:20 AM, david.w.smi...@gmail.com < david.w.smi...@gmail.com> wrote: > First of all,

Re: Indexing Twitter - Hypothetical

2016-03-04 Thread Jack Krupansky
As always, the initial question is how you intend to query the data - query drives data modeling. How real-time do you need queries to be? How fast do you need archive queries to be? How many fields do you need to query on? How much entity recognition do you need in queries? -- Jack Krupansky

Re: Indexing Twitter - Hypothetical

2016-03-03 Thread Jack Krupansky
As always, the initial question always needs to be how you wish to query the data - query will drive the data model. I don't want to put words in your mouth as to your query requirements, so... clue us in on your query requirements. -- Jack Krupansky On Thu, Mar 3, 2016 at 2:25 PM, Toke

Re: What is the best way to index 15 million documents of total size 425 GB?

2016-03-03 Thread Jack Krupansky
for a 425GB index in terms of odds of low query latency. -- Jack Krupansky On Thu, Mar 3, 2016 at 12:54 PM, Aneesh Mon N <aneeshm...@gmail.com> wrote: > Hi, > > We are facing a huge performance issue while indexing the data to Solr, we > have around 15 million records in a P

Re: FW: Difference Between Tokenizer and filter

2016-03-03 Thread Jack Krupansky
/term text and position at each step. But even that won't help if you are unable to grasp what is stated on the basic doc page. -- Jack Krupansky On Thu, Mar 3, 2016 at 8:51 AM, G, Rajesh <r...@cebglobal.com> wrote: > Hi Shawn, > > One last question on analyzer. If the format of the index on

Re: Indexing books, chapters and pages

2016-03-01 Thread Jack Krupansky
filtering to eliminate those false positives. There is also the question of maximum phrase size - most phrases tend to be reasonably short, but sometimes people may want to search for an entire paragraph (e.g., a quote) that may span multiple lines on two adjacent pages. -- Jack Krupansky On Tue, Mar 1

Re: Indexing books, chapters and pages

2016-03-01 Thread Jack Krupansky
selects a book group you can re-query with the specific book and then group by chapter. -- Jack Krupansky On Tue, Mar 1, 2016 at 8:08 AM, Zaccheo Bagnati <zacch...@gmail.com> wrote: > Original data is quite well structured: it comes in XML with chapters and > tags to mark the original

Re: Indexing books, chapters and pages

2016-03-01 Thread Jack Krupansky
To start, what is the form of your input data - is it already divided into chapters and pages? Or... are you starting with raw PDF files? -- Jack Krupansky On Tue, Mar 1, 2016 at 6:56 AM, Zaccheo Bagnati <zacch...@gmail.com> wrote: > Hi all, > I'm searching for ideas on how to d

Re: ExtendedDisMax configuration nowhere to be found

2016-02-29 Thread Jack Krupansky
n any case, a proper tombstone is probably the best step at this point. -- Jack Krupansky On Mon, Feb 29, 2016 at 10:39 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > It is indeed a problem that the old edismax wiki is result #1 from Google. > I find that annoying as well sinc

Re: ExtendedDisMax configuration nowhere to be found

2016-02-29 Thread Jack Krupansky
It is indeed a problem that the old edismax wiki is result #1 from Google. I find that annoying as well since I also use Google search as my first step in accessing doc on everything. -- Jack Krupansky On Mon, Feb 29, 2016 at 10:03 AM, <jimi.hulleg...@svensktnaringsliv.se> wrote: > Tha

Re: ExtendedDisMax configuration nowhere to be found

2016-02-29 Thread Jack Krupansky
with the current Solr Reference Guide. The old edismax wiki should in fact have a tombstone warning that indicates that it is obsolete and redirect people to the new doc. Out of curiosity, how did you get to that old wiki page in the first place? -- Jack Krupansky On Mon, Feb 29, 2016 at 3:20 AM

Re: ExtendedDisMax configuration nowhere to be found

2016-02-28 Thread Jack Krupansky
So, all this hard work that people have put into Solr to make it more like a Disney theme park is just... wasted... on you? Sigh. Okay, I guess we can't please everyone. -- Jack Krupansky On Sun, Feb 28, 2016 at 5:40 PM, <jimi.hulleg...@svensktnaringsliv.se> wrote: > I have n

Re: ExtendedDisMax configuration nowhere to be found

2016-02-28 Thread Jack Krupansky
is the list of fields to query (qf) and your actual query text (q). I know, I know... some people just can't handle automatic. (Some people hate DisneyLand/World!) -- Jack Krupansky On Sun, Feb 28, 2016 at 5:16 PM, <jimi.hulleg...@svensktnaringsliv.se> wrote: > I'm sorry, but I am still conf

Re: ExtendedDisMax configuration nowhere to be found

2016-02-28 Thread Jack Krupansky
st handler in solrconfig.xml. -- Jack Krupansky On Sun, Feb 28, 2016 at 2:42 PM, <jimi.hulleg...@svensktnaringsliv.se> wrote: > Hi, > > I want to setup ExtendedDisMax in our solr 4.6 server, but I can't seem to > find any example configuration for this. Ie the configuration needed in &

Re: Query time de-boost

2016-02-28 Thread Jack Krupansky
in the bq parameter do you need to use negative boost values - in all the other contexts a fractional boost is sufficient. It's unfortunate that the ref guide isn't more clear about this key distinction. Now hopefully we (and others!) are on the same page. -- Jack Krupansky On Sun, Feb 28, 2016

Re: Solr regex documenation

2016-02-27 Thread Jack Krupansky
See: https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/search/RegexpQuery.html https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/util/automaton/RegExp.html I vaguely recall a Jira about regex not working at all in Solr. I don't recall reading about a resolution. -- Jack

Re: Query time de-boost

2016-02-26 Thread Jack Krupansky
. IOW, it de-boosts occurrences of the term. The point remains that you do not need a "negative boost" to de-boost a term. -- Jack Krupansky On Fri, Feb 26, 2016 at 4:01 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi Jack, > I just checked on 5.5 and

Re: Query time de-boost

2016-02-25 Thread Jack Krupansky
0.1 is a fractional boost - all intra-query boosts are multiplicative, not additive, so term^0.1 reduces the term by 90%. -- Jack Krupansky On Wed, Feb 24, 2016 at 11:29 AM, shamik <sham...@gmail.com> wrote: > Binoy, 0.1 is still a positive boost. With title getting the highest

Re: WhitespaceTokenizerFactory and PathHierarchyTokenizerFactory

2016-02-25 Thread Jack Krupansky
all of the elements of a path or IP as separate terms. Ditto for a query, so you can effectively do bth keyword and phrase queries to match individual terms (e.g., path elements) or phrases or sequences of path elements or IP address components. -- Jack Krupansky On Thu, Feb 25, 2016 at 12:41 AM

Re: WhitespaceTokenizerFactory and PathHierarchyTokenizerFactory

2016-02-24 Thread Jack Krupansky
Your statement makes no sense. Please clarify. Express your requirement(s) in plain English first before dragging in possible solutions. Technically, path elements can have embedded spaces. -- Jack Krupansky On Wed, Feb 24, 2016 at 6:53 AM, Anil <anilk...@gmail.com> wrote: > HI, &

Re: Reverse Eningeer Query For a Given Result Set?

2016-02-18 Thread Jack Krupansky
positive or false negative as new documents are added to the index that are no longer in the same pattern as the old results by still within the pattern of the original Oracle query. The trick may be whether the delta is meaningful for the actual application use case. -- Jack Krupansky On Thu, Feb 18

Re: Negating multiple array fileds

2016-02-17 Thread Jack Krupansky
anybody grief for using it as a way of compensating for the brain-damaged way that Lucene and Solr handle single-asterisk and negated single-asterisk queries. -- Jack Krupansky On Tue, Feb 16, 2016 at 8:17 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 2/15/2016 9:22 AM, Jack Krupan

Re: Near Duplicate Documents, "authorization"? tf/idf implications, spamming the index?

2016-02-15 Thread Jack Krupansky
similarity schema could both specify any number of document categories. But that's speculation on my part. -- Jack Krupansky On Mon, Feb 15, 2016 at 6:42 PM, Chris Morley <ch...@depahelix.com> wrote: > Hey Solr people: > > Suppose that we did not want to break up our document se

Re: Negating multiple array fileds

2016-02-15 Thread Jack Krupansky
I should also have noted that your full query: (-persons:*)AND(-places:*)AND(-orgs:*) can be written as: -persons:* -places:* -orgs:* Which may work as is, or can also be written as: *:* -persons:* -places:* -orgs:* -- Jack Krupansky On Mon, Feb 15, 2016 at 1:57 AM, Salman Ansari

Re: "pf" not supported by edismax?

2016-02-14 Thread Jack Krupansky
. -- Jack Krupansky On Mon, Feb 15, 2016 at 12:11 AM, Derek Poh <d...@globalsources.com> wrote: > It is using KeywordTokenizerFactory. It is still consider as tokenized? > > Here's the field definition: > type="gs_keyword_exact" multiValued="true"

Re: "pf" not supported by edismax?

2016-02-14 Thread Jack Krupansky
pf stands for phrase boosting, which implies tokenized text... spp_keyword_exact sounds like it is not tokenized. -- Jack Krupansky On Sun, Feb 14, 2016 at 10:08 PM, Derek Poh <d...@globalsources.com> wrote: > Hi > > Correct me If I am wrong, edismax is an extensio

Re: Negating multiple array fileds

2016-02-14 Thread Jack Krupansky
Due to a bug (or poorly designed feature), you need to explicitly include a non-negative query term in a purely negative sub-query. Usually this means using *:* to select all documents. Note that the use of parentheses introduces a sub-query. So, (-persons:*) s.b. (*:* -persons:*). -- Jack

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Jack Krupansky
is that? How many fields are in fl? Any function queries in fl? -- Jack Krupansky On Fri, Feb 12, 2016 at 4:57 AM, Matteo Grolla <matteo.gro...@gmail.com> wrote: > Hi Jack, > tell me if I'm wrong but qtime accounts for search time excluding the > fetch of stored fields (I have a 90ms q

Re: query knowledge graph

2016-02-12 Thread Jack Krupansky
"knowledge graph" is kind of vague - what did you have in mind? An example would help. -- Jack Krupansky On Fri, Feb 12, 2016 at 7:27 AM, Midas A <test.mi...@gmail.com> wrote: > Please suggest how to create query knowledge graph for e-commerce > application . > >

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Is this a scenario that was working fine and suddenly deteriorated, or has it always been slow? -- Jack Krupansky On Thu, Feb 11, 2016 at 4:33 AM, Matteo Grolla <matteo.gro...@gmail.com> wrote: > Hi, > I'm trying to optimize a solr application. > The bottleneck are queri

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
wildcards or function queries, or is it very simple keywords? How many operators? Have you used the debugQuery=true parameter to see which search components are taking the time? -- Jack Krupansky On Thu, Feb 11, 2016 at 9:42 AM, Matteo Grolla <matteo.gro...@gmail.com> wrote: >

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Good to know. Hmmm... 200ms for 10 rows is not outrageously bad, but still relatively bad. Even 50ms for 10 rows would be considered barely okay. But... again it depends on query complexity - simple queries should be well under 50 ms for decent modern hardware. -- Jack Krupansky On Thu, Feb 11

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Again, first things first... debugQuery=true and see which Solr search components are consuming the bulk of qtime. -- Jack Krupansky On Thu, Feb 11, 2016 at 11:33 AM, Matteo Grolla <matteo.gro...@gmail.com> wrote: > virtual hardware, 200ms is taken on the client until response i

Re: Need to move on SOlr cloud (help required)

2016-02-10 Thread Jack Krupansky
, then that one node should be replaced. There are indeed plenty of good reasons to prefer SolrCloud over traditional master-slave replication, but so far you haven't touched on any of them. How much data (number of documents) do you have? What is your typical query latency? -- Jack Krupansky

Re: Solr architecture

2016-02-08 Thread Jack Krupansky
can execute them or if they require fanout to other shards and then aggregation of results from those other shards. -- Jack Krupansky On Mon, Feb 8, 2016 at 11:24 AM, Erick Erickson <erickerick...@gmail.com> wrote: > Short form: You really have to prototype. Here's the long form: &g

  1   2   3   4   5   6   7   8   9   10   >