Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-12 Thread Joe Wicentowski
Hi all, Forgive me. Rather than post more code in this thread, I've created a gist with revised code that resolves some inconsistencies in what I posted here earlier. https://gist.github.com/joewiz/7581205ab5be46eaa25fe223acda42c3 Again, this isn't a full-featured CSV parser by any means; it

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-12 Thread Joe Wicentowski
And corrected query body: let $csv := 'Author,Title,ISBN,Binding,Year Published Jeannette Walls,The Glass Castle,074324754X,Paperback,2006 James Surowiecki,The Wisdom of Crowds,9780385503860,Paperback,2005 Lawrence Lessig,The Future of Ideas,9780375505782,Paperback,2002 "Larry Bossidy, Ram

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-12 Thread Joe Wicentowski
Sorry, a typo crept in. Here's the corrected function: declare function local:get-cells($row as xs:string) as xs:string { (: workaround lack of lookahead support in XPath: end row with comma :) let $string-to-analyze := $row || "," let $analyze :=

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-12 Thread Joe Wicentowski
Hi Christian, Yes, that sounds like the culprit. Searching back through my files, Adam Retter responded on exist-open (at http://markmail.org/message/3bxz55du3hl6arpr) to a call for help with the lack of lookahead support in XPath, by pointing to an XSLT he adapted for CSV parsing,

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-12 Thread Christian Grün
> Christian: I tried removing the quote escaping but still get an error. > Here's a small test to reproduce: > > fn:analyze-string($row, '(?:\s*(?:"([^"]*)"|([^,]+))\s*,?|(?<=,)(),?)+?') I assume it’s the lookbehind assertion that is not allowed in XQuery (but I should definitely spend more

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-12 Thread Joe Wicentowski
Hi all, Christian: I completely agree, CSV is a nightmare. One way to reduce the headaches (in, say, developing an EXPath CSV library) might be to require that CSV pass validation by a tool such as http://digital-preservation.github.io/csv-validator/. Adam Retter presented his work on CSV

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-12 Thread Christian Grün
I didn’t check the regex in general, but one reason I think why it fails is the escaped quote. For example, the following query is illegal in XQuery 3.1… matches('a"b', 'a\"b') …where as the following one is ok: matches('a"b', 'a"b') On Mon, Sep 12, 2016 at 1:15 PM, Hans-Juergen Rennau

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-12 Thread Hans-Juergen Rennau
Cordial thanks, Liam - I was not aware of that! @Joe: Rule of life: when one is especially sure to be right, one is surely wrong, and so was I, and right were you(r first two characters). Liam R. E. Quin schrieb am 5:54 Montag, 12.September 2016: Hans-Jürgen, wrote: !

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-11 Thread Liam R. E. Quin
Hans-Jürgen, wrote: ! Already the first > two characters  >     (?render the expression invalid:(1) An unescaped ? is an > occurrence indicator, making the preceding entity optional(2) An > unescaped ( is used for grouping, it does not repesent anything > => there is no entity preceding the ?

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-11 Thread Christian Grün
@Hans-Jürgen… Nice work, thanks for the hint! On Sun, Sep 11, 2016 at 10:23 PM, Hans-Juergen Rennau wrote: > Joe, just in case it is of interest to you: the TopicTools framework, > downloadable at > >https://github.com/hrennau/topictools > > contains an XQuery-implemented,

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-11 Thread Christian Grün
Hi Joe, My concern is that a single regex, no matter how complex, won’t do justice to parse arbitary CSV data. The CSV input we got so far for testing was simply too diverse (I spent 10% of my time into implementing a basic CSV parser in BaseX, and 90% into examining these special cases, and

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-11 Thread Hans-Juergen Rennau
Joe, just in case it is of interest to you: the TopicTools framework, downloadable at    https://github.com/hrennau/topictools contains an XQuery-implemented, full-featured csv parser (module _csvParser.xqm, 212 lines). Writing XQuery tools using the framework, the parser is automatically added

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-11 Thread Joe Wicentowski
Hans-Jürgen, I figured as much. I wonder if we can come up with an xsd-compliant regex for this purpose? It may not give us a full-featured CSV parser, but would handle reasonably uniform cases. Joe Sent from my iPhone On Sun, Sep 11, 2016 at 3:39 PM -0400, "Hans-Juergen Rennau"

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-11 Thread Hans-Juergen Rennau
Joe, concerning your regex, I would complain, too! Already the first two characters     (?render the expression invalid:(1) An unescaped ? is an occurrence indicator, making the preceding entity optional(2) An unescaped ( is used for grouping, it does not repesent anything => there is no

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-11 Thread Joe Wicentowski
Thanks for your replies and interest, Hans-Jürgen, Marc, Vincent, and Christian. The other day, short of a comprehensive solution, I went in search of a regex that would handle quoted values that contain commas that shouldn't serve as delimiters. I found one that worked in eXist but not in

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-11 Thread Christian Grün
Hi Joe, Thanks for your mail. You are completely right, using an array would be the natural choice with csv:parse. It’s mostly due to backward compatibility that we didn’t update the function. @All: I’m pretty sure that all of us would like having an EXPath spec for parsing CSV data. We still

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-08 Thread Hans-Juergen Rennau
..@mailman.uni-konstanz.de]On Behalf Of Hans-Juergen Rennau Sent: Thursday, September 08, 2016 10:02 AM To: Marc van Grootel <marc.van.groo...@gmail.com> Cc: BaseX <basex-talk@mailman.uni-konstanz.de> Subject: Re: [basex-talk] csv:parse in the age of XQuery 3.1   What concerns

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-08 Thread Lizzi, Vincent
Cc: BaseX <basex-talk@mailman.uni-konstanz.de> Subject: Re: [basex-talk] csv:parse in the age of XQuery 3.1 What concerns me, I definitely want the CSV as XML. But the performance problems have certainly nothing to do with XML versus CSV (I often deal with > 300 MB XML, which is parsed

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-08 Thread Hans-Juergen Rennau
What concerns me, I definitely want the CSV as XML. But the performance problems have certainly nothing to do with XML versus CSV (I often deal with > 300 MB XML, which is parsed very fast!) - it is the parsing operation itself which, if I'm not mistaken, is handled by XQuery code and which

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-08 Thread Marc van Grootel
I'm currently dealing with CSV a lot as well. I tend to use the format=map approach but not nearly as large as 22 MB CSV yet. I'm wondering if, or how much more efficient it is to deal with this type of data as arrays and map data structures versus XML. For most processing I can leave serializing

Re: [basex-talk] csv:parse in the age of XQuery 3.1

2016-09-08 Thread Hans-Juergen Rennau
Joe, just to back you: I believe that an EXPath spec for CSV processing would be *extremely* useful! (There is hardly a format as ubiquitous as CSV.) And I had similar experience concerning the performance - concretely, a 22 MB file proved to be simply unprocessable! Which means that BaseX