Re: [basex-talk] Full-text bug (?) - all words / any word + stemming

2022-10-28 Thread Christian Grün
Thanks, Hans-Jürgen. I have added your observations in the existing GitHub issue on misbehaving XQFT expressions. Best, Christian On Thu, Oct 27, 2022 at 4:59 PM Hans-Juergen Rennau wrote: > > Hello, > > I encountered a behaviour of full-text search which appears to me a bug: > > (1) OK > basex

[basex-talk] Full-text bug (?) - all words / any word + stemming

2022-10-27 Thread Hans-Juergen Rennau
Hello, I encountered a behaviour of full-text search which appears to me a bug: (1) OKbasex "'adverse' contains text 'adverse' using stemming" => true (2) NOTOKbasex "'adverse' contains text 'adverse' all words using stemming"=> false  (3) NOTOKbasex "'adverse' contains text 'adverse' any word usi

Re: [basex-talk] Full text search with non-Latin scripts

2022-09-26 Thread Christian Grün
Hi Andreas, Sorry for the late reply. Fuzzy/approximate search is currently limited to the Latin character set. Best regards, Christian Andreas Hartmann schrieb am Sa., 17. Sept. 2022, 12:33: > Hi, > > I have problems with full text search in 10.1, when non-Latin script is > involved. Expre

Re: [basex-talk] Full Text bug (?) - window ... words

2022-09-25 Thread Christian Grün
Hi Hans-Jürgen, You are right. I’ve created an issue to get this fixed [1]. Best, Christian [1] https://github.com/BaseXdb/basex/issues/2141 On Tue, Sep 13, 2022 at 4:43 PM Hans-Juergen Rennau wrote: > > Dear BaseX people, > > it seems to me there is a bug concerning Full Text Search, using

[basex-talk] Full text search with non-Latin scripts

2022-09-17 Thread Andreas Hartmann
Hi, I have problems with full text search in 10.1, when non-Latin script is involved. Expressions such as "βάναυσος" contains text "67644567" using fuzzy "67644567" contains text "βάναυσος" using fuzzy "humbug" contains text "βάναυσος" using fuzzy all yield "true". It's the same with Hebrew stri

Re: [basex-talk] Full-Text-Search Options

2020-09-19 Thread Günter Dunz-Wolff
Sorry, it was my fault. Problem solved. Sorry again. Kindly regards Guenter

[basex-talk] Full-Text-Search Options

2020-09-19 Thread Günter Dunz-Wolff
Dear all, I’m using ft:mark with Options. In older versions of basex it worked fine, but with 9.3 I don’t get any results with the „distance" option and the „not in“ option. Maybe my syntax is deprecated? Example distance: ft:mark($ele[.//text() contains text {$query_string} all using wildcards

Re: [basex-talk] Full-text index: searches for common words in another node. Does it take a lot of time?

2020-05-19 Thread ETANCHAUD Fabrice
on for things like governance, mediocracy... All the best from french west coast, Fabrice Etanchaud De : Sebastian Guerrero Envoyé : lundi 18 mai 2020 20:32 À : ETANCHAUD Fabrice Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Full-text index

Re: [basex-talk] Full-text index: searches for common words in another node. Does it take a lot of time?

2020-05-18 Thread Sebastian Guerrero
really appreciated working with basex that time, because others were in > a kind of java/relational mapping hell... Me, I just had to add xml > documents, reindex, and sometimes purge deleted items. > > Best, > Fabrice > > -- > *De :* BaseX-Talk de la part

Re: [basex-talk] Full-text index: searches for common words in another node. Does it take a lot of time?

2020-05-18 Thread ETANCHAUD Fabrice
k de la part de Sebastian Guerrero Envoyé : lundi 18 mai 2020 17:23 À : BaseX Objet : [basex-talk] Full-text index: searches for common words in another node. Does it take a lot of time? Hi everybody. I'm here again with my doubts. Thank you for your patience. ^^ I have a database of tradem

[basex-talk] Full-text index: searches for common words in another node. Does it take a lot of time?

2020-05-18 Thread Sebastian Guerrero
Hi everybody. I'm here again with my doubts. Thank you for your patience. ^^ I have a database of trademarks with a full-text index for two nodes: **:mark-identification,*:party-name*. [1] Where "*mark-identification*" contains the name of the trademark, and " *party-name*" contains the name of

Re: [basex-talk] Full text and stopwords

2020-02-13 Thread Christian Grün
Hi Ben, > According to the docs, a stopword list can be used to decrease the size > of the full text index. I had no problems when using this list while > creating a database. > > Is it also possible to use this list for other purposes? Yes. As it’s a simple word list, you can use do with it what

Re: [basex-talk] Full text and stopwords

2020-02-12 Thread Ben Engbers
Op 12-02-2020 om 10:21 schreef Ben Engbers: > Hi Christian, > Would it be a good approach to create a separate database for stop words and sentiments? > > Cheers, > Ben >

[basex-talk] Full text and stopwords

2020-02-12 Thread Ben Engbers
Hi Christian, According to the docs, a stopword list can be used to decrease the size of the full text index. I had no problems when using this list while creating a database. Is it also possible to use this list for other purposes? 1 According to XQueryX 3.1.pdf it is possible to use a sequence

Re: [basex-talk] Full-text index on mixed content

2019-09-23 Thread Christian Grün
Hi Daniel, Thanks for your mail. Just a short while ago, we had thoughts on how to extend indexing and query rewriting without completely rehauling our optimization engine, so it might be worth sharing this idea. At the moment, as you may know, only text nodes and attribute values end up in the B

[basex-talk] Full-text index on mixed content

2019-09-18 Thread Schopper, Daniel
Dear all, chatting after a session of the ongoing TEI conference ( https://graz-2019.tei-c.org) I was asked about plans to support fulltext indexes on mixed content nodes in BaseX – I did not know of any, so I wanted to pass the question on to this list: Is there a plan to implement this feature in

Re: [basex-talk] Full-Text Search long s

2019-07-26 Thread Christian Grün
Hi Günter, You can take advantage of the unicode normalization features of XQuery: declare function local:normalize($string) { $string => normalize-unicode('NFKD') => replace('\p{IsCombiningDiacriticalMarks}', '') }; for $text in ('Büchſe', 'Buͤchſe') return local:normalize($t

Re: [basex-talk] Full-Text Search long s

2019-07-26 Thread Markus Wittenberg
Hi Guenter, you should have a look a the matches [1]  function and work with regular expressions to perform this task. Best regards,   Markus [1] http://www.xqueryfunctions.com/xq/fn_matches.html Am 26.07.2019 um 17:39 schrieb Günter Dunz-Wolff: Hi all, I’m working since some years on a d

[basex-talk] Full-Text Search long s

2019-07-26 Thread Günter Dunz-Wolff
Hi all, I’m working since some years on a digital edition of the works of a former german author. In my transcription of those works are lots of gothic characters like the old german long s (Unicode: LATIN SMALL LETTER LONG S). For example: Büchſe (exactly Buͤchſe). In my Full-Text-Search my g

Re: [basex-talk] Full-Text

2018-07-23 Thread Ветошкин Владимир
Hi, Christian! Unfortunately, not :( 22.07.2018, 18:55, "Christian Grün" :This seems to be a limitation of the Russian stemmer implementation, which we took from the Apache Lucene project. Maybe we could replace it with a more sophisticated implementation. Do you have some experience with other ste

Re: [basex-talk] Full-Text

2018-07-22 Thread Christian Grün
This seems to be a limitation of the Russian stemmer implementation, which we took from the Apache Lucene project. Maybe we could replace it with a more sophisticated implementation. Do you have some experience with other stemmers that are available in the wild? Ветошкин Владимир schrieb am So.,

Re: [basex-talk] Full-Text

2018-07-22 Thread Ветошкин Владимир
Hi! After some tests of search (using stemming, using language ru) I have found several problems.E.g.:if search for "кузов" - it doesn't find "кузова"  20.07.2018, 14:26, "Ветошкин Владимир" :Christian, you're genius :)Thank you very much for your help! 20.07.2018, 14:19, "Christian Grün"

Re: [basex-talk] Full-Text

2018-07-20 Thread Ветошкин Владимир
Christian, you're genius :)Thank you very much for your help! 20.07.2018, 14:19, "Christian Grün" :I think I found the missing pieces:• In your full-text index, you used non-default options (which iscompletely fine)• In the rewritten query, these options cannot applied to your query(because, once a

Re: [basex-talk] Full-Text

2018-07-20 Thread Christian Grün
I think I found the missing pieces: • In your full-text index, you used non-default options (which is completely fine) • In the rewritten query, these options cannot applied to your query (because, once again, they are not known at compile time). Your query should yield the expected results if yo

Re: [basex-talk] Full-Text

2018-07-20 Thread Ветошкин Владимир
Query plan (0 rows):             db:enforceindex                                                                          000999~                                                                                                                                                    автомобиль           

Re: [basex-talk] Full-Text

2018-07-20 Thread Christian Grün
> These examples work differently. So If I read this correctly, the number of results for 1. is still identical, right? However, in the second query in 2., no results are returned. You could report the query plan will give us some insight into what happens here. > 1. > (# db:enforceindex #) { >

Re: [basex-talk] Full-Text

2018-07-20 Thread Ветошкин Владимир
Hmm..Yes, it works!But! These examples work differently.1.(# db:enforceindex #) {  for $db in db:list()[starts-with(.,'000999~')]  return db:open($db)//*[text() contains text { 'болт' } any]}378 rows let $dbs := for $i in db:list()[starts-with(.,'000999~')] return $ifor $db in $dbslet $ft := ft:sea

Re: [basex-talk] Full-Text

2018-07-20 Thread Christian Grün
> let $dbs := for $i in db:list()[starts-with(.,'000999~')] return $i > for $db in $dbs > for $doc in db:open($db)/.//*[(# db:enforceindex #) { text() contains text { > 'TEN-9258' } any }] > return $doc Maybe you’ll have to use the pragma on top of your expression: (# db:enforceindex #) { for

Re: [basex-talk] Full-Text

2018-07-20 Thread Ветошкин Владимир
in two collections : - A big readonly collection of all the past updates, indexed once - A small/medium sized collection whom full text index can be recreated in an acceptable time after each update. At the end of a predefined time period, you have to add the live collection to the readonly one, r

Re: [basex-talk] Full-Text

2018-07-20 Thread Christian Grün
n two collections : > > - A big readonly collection of all the past updates, indexed once > > - A small/medium sized collection whom full text index can be > recreated in an acceptable time after each update. > > At the end of a predefined time period, you have

Re: [basex-talk] Full-Text

2018-07-19 Thread Ветошкин Владимир
ne. Best regards from France,Fabrice Etanchaud [1] http://docs.basex.org/wiki/Indexes#UpdatesDe : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Envoyé : jeudi 21 juin 2018 16:02À : BaseXObjet : [basex-talk] Full-Text Hi, everyone! Is there any wa

Re: [basex-talk] Full-Text

2018-06-25 Thread Ветошкин Владимир
reindex it, and truncate the live one. Best regards from France,Fabrice Etanchaud [1] http://docs.basex.org/wiki/Indexes#UpdatesDe : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Envoyé : jeudi 21 juin 2018 16:02À : BaseXObjet : [basex-talk] Full

Re: [basex-talk] Full-Text

2018-06-25 Thread Alexander Shpack
d of a predefined time period, you have to add the live > collection to the readonly one, reindex it, and truncate the live one. > > > > Best regards from France, > > Fabrice Etanchaud > > > > [1] http://docs.basex.org/wiki/Indexes#Updates > > > > > >

Re: [basex-talk] Full-Text

2018-06-25 Thread Ветошкин Владимир
seX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de ???? ????Envoyé : jeudi 21 juin 2018 16:02À : BaseXObjet : [basex-talk] Full-Text Hi, everyone! Is there any way to index only imported xml-files?Now, when I import xml-files the full-text index is deleted.After imp

Re: [basex-talk] Full-Text

2018-06-25 Thread Alexander Shpack
> > At the end of a predefined time period, you have to add the live > collection to the readonly one, reindex it, and truncate the live one. > > > > Best regards from France, > > Fabrice Etanchaud > > > > [1] http://docs.basex.org/wiki/Indexes#Updates > &g

Re: [basex-talk] Full-Text

2018-06-25 Thread Ветошкин Владимир
live collection to the readonly one, reindex it, and truncate the live one. Best regards from France,Fabrice Etanchaud [1] http://docs.basex.org/wiki/Indexes#UpdatesDe : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de ????Envoyé : jeudi 21 juin 2018 16:02À :

Re: [basex-talk] Full-Text

2018-06-25 Thread Ветошкин Владимир
u can also specify a list of element or attribute names to be indexed, and by the way reduce the time needed to reindex. Best regards,Fabrice De : Ветошкин Владимир [mailto:en-tra...@yandex.ru]Envoyé : lundi 25 juin 2018 09:42À : Fabrice ETANCHAUD; BaseXObjet : Re: [basex-talk] Full-Text Hi, Fabrice

Re: [basex-talk] Full-Text

2018-06-25 Thread Alexander Shpack
gt; > > > > > > *De :* BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] *De > la part de* > *Envoyé :* jeudi 21 juin 2018 16:02 > *À :* BaseX > *Objet :* [basex-talk] Full-Text > > > > Hi, everyone! > > > >

Re: [basex-talk] Full-Text

2018-06-25 Thread Fabrice ETANCHAUD
] Envoyé : lundi 25 juin 2018 09:42 À : Fabrice ETANCHAUD; BaseX Objet : Re: [basex-talk] Full-Text Hi, Fabrice! Thank you. All databases constantly change.That is why there is no way to single out "a big readonly collection" :( Maybe it is possible to use some other incremental indexes?

Re: [basex-talk] Full-Text

2018-06-25 Thread Ветошкин Владимир
sex.org/wiki/Indexes#UpdatesDe : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Envoyé : jeudi 21 juin 2018 16:02À : BaseXObjet : [basex-talk] Full-Text Hi, everyone! Is there any way to index only imported xml-files?Now, when I import xml-files th

Re: [basex-talk] Full-Text

2018-06-21 Thread Fabrice ETANCHAUD
://docs.basex.org/wiki/Indexes#Updates De : BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] De la part de Envoyé : jeudi 21 juin 2018 16:02 À : BaseX Objet : [basex-talk] Full-Text Hi, everyone! Is there any way to index only imported xml-files? Now, when I import

[basex-talk] Full-Text

2018-06-21 Thread Ветошкин Владимир
Hi, everyone! Is there any way to index only imported xml-files?Now, when I import xml-files the full-text index is deleted.After importing I recreate whole full-text index and it takes too much time :( -- С уважением,Ветошкин Владимир Владимирович 

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-07-04 Thread Christian Grün
Thanks. I’ll keep this proposal in mind, and think about further implications. If we decided one day to make the full-text index updatable (which would be a nice feature, but a lot of work), we would probably need to reindex sub-trees with modified language attributes. On Tue, Jul 4, 2017 at 8:3

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-07-03 Thread Kristian Kankainen
Yes, you are correct. During index building, only Häuser is lemmatized, thus //div[text() contains text { "houses","Häuser" } using language 'de' using stemming ] returns only the element with Häuser. But a query without stemming and language: //div[text() contains text { "houses","Häu

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-07-03 Thread Christian Grün
To be sure if I understood you correctly: > * If STEMMING is set to true, then the input to the stemmer should be > filtered by matching the xml:lang and the LANGUAGE option. Text that is sent > to the tokenizer could be left as is and not be filtered by matching > LANGUAGE (see next point). So y

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-07-03 Thread Kristian Kankainen
Hi Christian, To refine the proposal. It would be great if the full-text index could be set up to consider xml:lang attributes in the following way: * If STEMMING is set to true, then the input to the stemmer should be filtered by matching the xml:lang and the LANGUAGE option. Text that is s

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-07-02 Thread Christian Grün
Hi Kristian, Right now, xml:lang attributes are completely ignored when indexing full-text. It’s an interesting idea to exclude texts that are marked with languages different to the one that is currently applied; I will think about it. However, I should have mentioned that the language option is

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-07-02 Thread Kristian Kankainen
Perhaps a proposal below. 27.06.2017 21:49 Christian Grün kirjutas: It is currently not possible to work with different languages in a single database. This is mostly because all normalized tokens will end up in the same internal index, and it would be a lot of effort to diversify this software

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-07-01 Thread Xavier-Laurent SALVADOR
the source document a requirement in your application? > > > > Thanks, > > Vincent > > > > *From:* basex-talk-boun...@mailman.uni-konstanz.de [mailto: > basex-talk-boun...@mailman.uni-konstanz.de] *On Behalf Of *Kristian > Kankainen > *Sent:* Friday, June 30, 201

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-07-01 Thread Kristian Kankainen
ect: Re: [basex-talk] Full-text lemmatizing and xml:lang   Hello Sorry for being slow in reception, being a full-time father of two kids is my only excuse. Thank you for enlightening answers. At first creating a separate database felt wrong and stupid, but after a while it felt just right

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-06-30 Thread Lizzi, Vincent
[mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf Of Kristian Kankainen Sent: Friday, June 30, 2017 5:27 PM To: Xavier-Laurent SALVADOR ; Christian Grün Cc: BaseX Subject: Re: [basex-talk] Full-text lemmatizing and xml:lang Hello Sorry for being slow in reception, being a full-time

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-06-30 Thread Kristian Kankainen
Hello Sorry for being slow in reception, being a full-time father of two kids is my only excuse. Thank you for enlightening answers. At first creating a separate database felt wrong and stupid, but after a while it felt just right and helping to organize different language elements via aggre

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-06-27 Thread Xavier-Laurent SALVADOR
Hi, After reading Christian answer ( :-) ); I thought it could be interesting to sort your docs according to @xml:lang and create a new DB next to your corpus : -- distinct-values( file:children('input-dir')[matches(.,'xml$')] ! (doc(.)//@xml:lang) ) ! db:create(

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-06-27 Thread Christian Grün
Hi Kristian, It is currently not possible to work with different languages in a single database. This is mostly because all normalized tokens will end up in the same internal index, and it would be a lot of effort to diversify this software behavior. As Xavier pointed out (thanks!), the best way

Re: [basex-talk] Full-text lemmatizing and xml:lang

2017-06-27 Thread Xavier-Laurent SALVADOR
Hi Kristian, This is useful for creating automatically databases according to xml:lang attribute let $dir := '/Users/me/myDesktop/' for $file in file:list($dir)[matches(.,'xml')] return let $flag := (data(doc($dir||$file)/div/@xml:lang)) return db:create("DB", $dir||$file, (), map { 'ft

[basex-talk] Full-text lemmatizing and xml:lang

2017-06-27 Thread Kristian Kankainen
Hello I have documents with text in several languages. When creating a database in BaseX I can choose *one* language for stemming for the full-text search index. Is there a way BaseX could lemmatize according to the elements xml:lang attribute? Best regards Kristian K

Re: [basex-talk] Full-Text Search with Stopwords: corner case hehavior

2016-01-05 Thread Christian Grün
Hi Ron, > Here is another strange behavior (not involving the thesaurus): This time it’s completely due to the spec. For some reasons, the word counter is incremented for both the input and query strings [1]. Because of that, the following query returns true, because "few" in the input string wil

Re: [basex-talk] Full-Text Search with Stopwords: corner case hehavior

2016-01-05 Thread Ron Katriel
Christian, Here is another strange behavior (not involving the thesaurus):     "Bayer Pharma AG" contains text "community medical associates" using stop words ("community", "medical", "associates") returns ‘true’ while     "Bayer" contains text "community medical associates" using stop words

Re: [basex-talk] Full-Text Search with Stopwords: corner case hehavior

2016-01-05 Thread Ron Katriel
Good catch. Case appears to also play a role. The following does not match     "samsung" contains text "samsung bioepis co., ltd." using fuzzy using stop words ( "co", "ltd") using thesaurus at "thesaurus.xml" even when the thesaurus contains the synonym "Samsung Bioepis Co., Ltd.” I tried the

Re: [basex-talk] Full-Text Search with Stopwords: corner case hehavior

2016-01-05 Thread Christian Grün
Phew… My guess is that no one has seriously looked at the interplay between stop words and the thesaurus so far ;) Maybe (lower/upper) case plays a role, too? On Tue, Jan 5, 2016 at 4:26 PM, Ron Katriel wrote: > Hi Christian, > > One follow up question. I thought stop words work in concert with

Re: [basex-talk] Full-Text Search with Stopwords: corner case hehavior

2016-01-05 Thread Ron Katriel
Hi Christian, One follow up question. I thought stop words work in concert with the thesaurus but I came across a case where they do not seem to. The following query returns false     "Samsung" contains text "Samsung Bioepis Co., Ltd." using fuzzy using stop words ( "co", "ltd") using thesauru

Re: [basex-talk] Full-Text Search with Stopwords: corner case hehavior

2016-01-03 Thread Ron Katriel
Thanks, Christian. I will look into the solution you suggested. Will need to cache the stop words to avoid repeatedly opening the file for reading. Ron On January 3, 2016 at 8:14:51 PM, Christian Grün (christian.gr...@gmail.com) wrote: > The behavior I am looking for is getting back false whene

Re: [basex-talk] Full-Text Search with Stopwords: corner case hehavior

2016-01-03 Thread Christian Grün
> The behavior I am looking for is getting back false whenever the text > following ‘contains text' is reduced to an empty string. Is there a simple > what of checking that? Hm, sounds easy, but I don’t have an easy answer to that. We should probably extend our ft:tokenize function to also take a

Re: [basex-talk] Full-Text Search with Stopwords: corner case hehavior

2016-01-03 Thread Ron Katriel
Hi Christian, The behavior I am looking for is getting back false whenever the text following ‘contains text' is reduced to an empty string. Is there a simple what of checking that? Thanks, Ron On January 3, 2016 at 7:41:47 PM, Christian Grün (christian.gr...@gmail.com) wrote: Hi Ron, >

Re: [basex-talk] Full-Text Search with Stopwords: corner case hehavior

2016-01-03 Thread Christian Grün
Hi Ron, > "Superior Laboratories" contains text { "Medical Affairs" } using stop > words ( "medical", "affairs” ) I’m pretty sure that "true" is the right answer here. I must admit that, due to the variety of options provided by the XQFT spec, it’s often not too obvious what’s going on. > is

[basex-talk] Full-Text Search with Stopwords: corner case hehavior

2016-01-03 Thread Ron Katriel
Hi, I noticed an unexpected behavior with full-text matching using stop words. The actual code is somewhat complex (it matches CT.gov trials with sponsor studies) but I was able to distill it to a simple expression:     "Superior Laboratories" contains text { "Medical Affairs" } using stop wor

Re: [basex-talk] Full-text search with ft:mark (linking back)

2015-12-31 Thread Christian Grün
Hi Günter, Try something like this: for $p in //p for $hits in ft:mark($p[.//text() contains text 'real']) return { $hits } Cheers, Christian On Wed, Dec 30, 2015 at 9:12 PM, kleist wrote: > Dear members, > given for example: ft:mark(//p[.//text() contains text 'real']). This will > yi

[basex-talk] Full-text search with ft:mark (linking back)

2015-12-30 Thread kleist
Dear members, given for example: ft:mark(//p[.//text() contains text 'real']). This will yield some results. Is there any way, to link back to the documents, where p was found? I know, it would work without ft:mark, but I need the -tag inside of the results, especially for fuzzy-search-results.

Re: [basex-talk] Full-text index with lots of data

2015-10-21 Thread Chuck Bearden
Okay, I figured it out. Evidently, I should have tested the evaluated value of 'text()' or '.' with the 'contains text' expression, not just the element itself. That's why both the simplified queries below use the full-text index. When I modify the WHERE clause of my query to read where $pa/perso

Re: [basex-talk] Full-text index with lots of data

2015-10-21 Thread Christian Grün
> Thanks for the suggestion. The full-text index is apparently not being > used. It is sometimes not obvious for the query optimizer how to rewrite a query to take full advantage of an index. You could try to start with a simple version of your query, see if the optimizer is used, and enhance it s

Re: [basex-talk] Full-text index with lots of data

2015-10-21 Thread Chuck Bearden
On Wed, Oct 21, 2015 at 3:13 AM, Christian Grün wrote: >> By giving 6G of RAM to the JVM I succeeded in building the full-text index, > > Good news! > >> but it doesn't seem to be making any difference in query time. > > Did you check the "query info" (either in the corresponding panel in > the GU

Re: [basex-talk] Full-text index with lots of data

2015-10-21 Thread Christian Grün
> By giving 6G of RAM to the JVM I succeeded in building the full-text index, Good news! > but it doesn't seem to be making any difference in query time. Did you check the "query info" (either in the corresponding panel in the GUI, or by using -V)? If it doesn't show the info "applying full-text

Re: [basex-talk] Full-text index with lots of data

2015-10-20 Thread Chuck Bearden
By giving 6G of RAM to the JVM I succeeded in building the full-text index, but it doesn't seem to be making any difference in query time. I have a slightly older copy of the data that is probably a hundred or so records smaller than the one that is indexed for full text, and my query takes about

Re: [basex-talk] Full-text index with lots of data

2015-10-20 Thread Chuck Bearden
Sorry for not using "Reply All" earlier. Setting FTINDEXSPLITSIZE to 2000 enabled the process to get a little further, if the meaning of each dot is the same. FTINDEXSPLITSIZE at default: ..|..|...

Re: [basex-talk] Full-text index with lots of data

2015-10-20 Thread Christian Grün
> Creating Database... > ..;..;..;..;..;..;.;..;.. Do you get any output after this line (I would expected to see a stack trace, or at least an error message…)? > Where 'pure_20151019' is both the name of the database and the > subdirectory where all my XML files are. > > It could well be that

Re: [basex-talk] Full-text index with lots of data

2015-10-20 Thread Chuck Bearden
Thank you for the suggestion. I'm trying it now. Here's how I'm going about it: cfbearden@quirkstation:~/projects/Influuent$ basex -d BaseX 8.3 [Standalone] Try help to get more information. > set addcache true ADDCACHE: true > set ftindex true FTINDEX: true > create db pure_20151019 pure_2015101

Re: [basex-talk] Full-text index with lots of data

2015-10-20 Thread Christian Grün
Hi Chuck, Usually, 4G is more than enough to create a full-text index for 16G of XML. Obviously, however, that's not the case for your input data. You could try to distribute your documents in multiple database. As as alternative, we could have a look at your data and try to find out what's going

[basex-talk] Full-text index with lots of data

2015-10-20 Thread Chuck Bearden
Hi all, I have about 16G of XML data in about 52000 files, and I was hoping to build a full-text index over it. I've tried two approaches: enable full-text indexing as I create the database and then loading the data, and creating the full-text index after loading the data. If I enable ADDCACHE and

Re: [basex-talk] Full text score with or

2014-11-23 Thread Christian Grün
Hi Andy, > Thanks, that works for me. I always prefer less complex code :-) so it would > be nice if this feature made a return at some point. So it did: In the latest snapshot, scores will again be propagated when using and, or, and predicates [2]. Cheers, Christian [1] http://files.basex.org/

Re: [basex-talk] Full text score with or

2014-04-16 Thread Andy Bunce
Hi Christian, Thanks, that works for me. I always prefer less complex code :-) so it would be nice if this feature made a return at some point. Regards /Andy On Fri, Apr 11, 2014 at 12:04 PM, Christian Grün wrote: > Hi Andy, > > as we haven't managed so far to formalize scoring propagation for

Re: [basex-talk] Full text score with or

2014-04-11 Thread Christian Grün
Hi Andy, as we haven't managed so far to formalize scoring propagation for all XQuery expressions, and as we thought that XQuery itself can also be used to combine score values, we have recently decided to reduce our scoring to the most essential operation (namely, "contains text"). One solution

[basex-talk] Full text score with or

2014-04-10 Thread Andy Bunce
Hi, I want to score searches with multiple conditions e.g. let $data:=( red apple, blue lagoon, fish and chips) for $hit score $s in $data[ a contains text("red") or b contains text("fish")] return $s In 7.8.2 it always returns scores of zero. In versi

[basex-talk] Full text queries ok :)

2013-12-02 Thread Ingarao Maud
Hello everybody Thanks to your help Christian, our full text queries yet work fine. I still don't understand exactly what the problem was and why we got 500 internal errors. Maybe a too heavy query and thus a memory problem ? Anyway now with this kind of writing, the queries are complete

Re: [basex-talk] Full Text Module - ft:search function

2013-10-25 Thread Christian Grün
Hi John, I think you’ll be glad to check out the latest extensions of the ft:search functions [1]. Feel free to check out the 7.8 snapshot. Christian [1] http://docs.basex.org/wiki/Full-Text_Module#ft:search ___ 2013/9/26 John Best : > Dear Christian, > > The syntax of t

Re: [basex-talk] Full Text Module - ft:search function

2013-09-30 Thread Christian Grün
> But one thing, I think, is missing and that is the Window/Distance Options > for proximity search. Feel free to add more details here: https://github.com/BaseXdb/basex/issues/762 ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https:

Re: [basex-talk] Full Text Module - ft:search function

2013-09-26 Thread Christian Grün
> Can these options be implemented.. ?? Yes, they could be implemented, but it may take a while to find a good formalization of all possible choices. Your suggestions are welcome. ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mai

[basex-talk] Full Text Module - ft:search function

2013-09-26 Thread John Best
Dear Christian, The syntax of the ft:search is - *ft:search($db as xs:string, $terms as item()*, $options as item()) as text()** In the $options part, I have seen various options like - Any Word, All Words, Phrase (for Exact Phrase), fuzzy and wildcards. These are really very good features. I a

Re: [basex-talk] Full Text Indexing 8 GB DB !!

2013-05-26 Thread Christian Grün
Hi John, creating a full-text index for a database of 8GB size may be a challenge if not enough main memory is available. I recommend you to play around with the FTINDEXSPLITSIZE argument [1]. If this doesn’t help, I would recommend you to split your data into several database instances. Please n

[basex-talk] Full Text Indexing 8 GB DB !!

2013-05-24 Thread John Best
Hi Team, I have a XML DB with approx 8 GB in size. The size of the DB will go on increasing through monthly updates. The size of monthly update will be around 10 MB in size.As the DB is storing text documents, I will have to index the full-text index again, after updating to speed up the process.

Re: [basex-talk] [FULL TEXT] Indexing text in several languages

2013-05-21 Thread Christian Grün
Hi Vincent, I’m sorry this is currently not possible with BaseX, mainly due various conceptual issues. The following query let $in := yes oui return $in//*[text() contains text '...' using stemming] would e.g. require that the specified query text is stemmed differently, depend

[basex-talk] [FULL TEXT] Indexing text in several languages

2013-05-21 Thread Vincent
Dear BaseX Team, I have some text in different languages stored in my XML files (tagged with xml:lang attributes) and I'd like to perform efficient full text queries on it (using stemming, stopwords...). I planned to use full text index. I can't figure out how I can set full text index for diff

Re: [basex-talk] full text with search term in a variable gives no matches

2013-05-14 Thread Christian Grün
Hi Liam, this is unusual; for the referenced document, Using ".//text()", I get.. Paris: 2485 Cambridge: 1136 London: 2444 Oxford: 1865 boy: 180 …and using ".", I get… Paris: 2484 Cambridge: 1130 London: 2442 Oxford: 1860 boy: 179 I get similar results for CHOP = true and false. Have you tried

Re: [basex-talk] full text with search term in a variable gives no matches

2013-05-12 Thread Liam R E Quin
On Sun, 2013-05-12 at 10:41 +0200, Christian Grün wrote: > Hi Liam, > > > The following query gives me no results: I rebuilt the full text index and now get results. Yay! :-) To be sure, I re-indexed the document and rebuilt all the indexes. However, for $city in ("Paris", "Cambridge", "London"

Re: [basex-talk] full text with search term in a variable gives no matches

2013-05-12 Thread Christian Grün
Hi Liam, > The following query gives me no results: could you pass me on a document for testing? Your query seemed to do the right thing for the following document: Paris > for $city as xs:string in ("Paris", "Cambridge", "London", "Oxford", ...) > return ($city, fn:count(db:fulltext("with-so

[basex-talk] full text with search term in a variable gives no matches

2013-05-11 Thread Liam R E Quin
The following query gives me no results: for $city as xs:string in ("Paris", "Cambridge", "London", "Oxford") return ( $city, count(/dictionary/letter/entry[.//p//text() contains text {$city}]), " " ) However, BaseX rewrites it to for $city as xs:string in ("Paris", "Cambridge", "London",

Re: [basex-talk] Full-Text search, how to mark whole phrases

2013-04-19 Thread Liam R E Quin
On Fri, 2013-04-19 at 11:25 +0200, Bartosz Marciniak wrote: > Is it possible to enclose result in one element > hello world ? In general this is not possible using XQuery and XPath Full Text 1.0, and it's the single most-requested feature for a future version. Many vendors provide an extension t

Re: [basex-talk] Full-Text search, how to mark whole phrases

2013-04-19 Thread Christian Grün
Hi Bartosz, > Is it possible to enclose result in one element > hello world ? unfortunately this is not possible, because at the stage of marking the terms, we don’t have any information if the matched terms were the result of a single or phrase search. If all of your adjacent query terms will be

[basex-talk] Full-Text search, how to mark whole phrases

2013-04-19 Thread Bartosz Marciniak
Hi, I have placed small document in database with enabled full-text index: Text with hello world phrase. The following expression: ft:mark(db:open('simple')//*[text() contains text 'hello world']) produces: Text with hello world phrase. Is it possible to enclose result in one element hello wo

Re: [basex-talk] Full Text Index giving error for ..7.5 GB XML Data

2012-12-18 Thread Dirk Kirsten
Hello John, did you follow the advice given in the error message to increase Javas heap size? For such a rather large database this is most likely necessary. Cheers, Dirk On Tue, Dec 18, 2012 at 9:26 AM, John Best wrote: > Dear XML Team, > > I have a XML DB of size 7.5GB. When creating an Ful

[basex-talk] Full Text Index giving error for ..7.5 GB XML Data

2012-12-18 Thread John Best
Dear XML Team, I have a XML DB of size 7.5GB. When creating an Full-Text Index I get an error message - * Out of Main Memory. You can try to: - increase Java's heap size with the flag -Xmx - deactivate the text and attribute indexes.* I tried this with 7.3 and thru GUI... To make searching fast

  1   2   >