Sure, no problem. I've added a comment indicating my interest to SQOOP-2649.
Thanks, - Ken On Fri, Jan 8, 2016 at 8:33 PM James Taylor <[email protected]> wrote: > Sounds good, Ken. If you could comment on SQOOP-2649 that you'd be > interested in using that functionality if it were available, that'd be much > appreciated. > > Thanks, > James > > On Fri, Jan 8, 2016 at 5:04 PM, Ken Hampson <[email protected]> wrote: > >> Hi, James: >> >> Thanks for the reply. I took a look at the patch attached to >> PHOENIX-2434, and it does look like it has a good chance to solve the >> problem, as Postgres exports boolean values as 't' and 'f', which the patch >> now checks for. Once we're able to get to Phoenix 4.7.0, I'll definitely >> try it out to confirm. >> >> We don't and won't be able to use Flyway due to various architectural >> considerations, but we do use Sqoop to get data from Postgres into Hive. I >> had skimmed SQOOP-2649 when originally looking at how to bulk load into >> Phoenix, and figured CSV was the way to go for now since the SQOOP-2649 >> enhancement was still in patch form. It's definitely something I will keep >> an eye on going forward. >> >> Thanks again, >> - Ken >> >> >> On Fri, Jan 8, 2016 at 1:13 PM James Taylor <[email protected]> >> wrote: >> >>> Hi Ken, >>> PHOENIX-2434 improved our CSV handling of booleans and will appear in >>> our upcoming 4.7.0 release. It'd be good if you can confirm whether or not >>> this is what you need. We definitely want to support ingest of CSVs from >>> other RDBMSs. >>> >>> There are a couple of other avenues of ingest into Phoenix: >>> - using Flyway DB (http://flywaydb.org/) where Phoenix support has been >>> added (not sure if this has appeared in a release yet or not, though) >>> - through a pull request to add support for Phoenix in Scoop ( >>> http://sqoop.apache.org/). Unfortunately, this isn't getting much >>> attention in that community (if this is useful for you, you may want to >>> chime in on SQOOP-2649 or on their mailing list). >>> >>> Thanks, >>> James >>> >>> On Thu, Jan 7, 2016 at 5:28 PM, Ken Hampson <[email protected]> wrote: >>> >>>> Hi, all: >>>> >>>> I am in the process of importing some data from Postgres into Phoenix >>>> as part of a larger project. I noticed after doing so and cross-checking >>>> the results of a fairly complex JOIN query (of 6 tables of so) that the >>>> number of rows returned by Postgres and Phoenix differed -- Phoenix had a >>>> significant number more rows. I investigated this further, and ultimately >>>> determined that the reason was that all *boolean *values were imported >>>> as being false, regardless of their actual value. >>>> >>>> When I investigated that aspect further, I discovered that Phoenix >>>> appears to require the full strings "true" and "false" in the CSV in order >>>> to properly register their actual values. As a workaround, I am currently >>>> using a full SELECT in the "\copy" statement in psql to select each column >>>> individually, and for each boolean column, I am using a CASE statement to >>>> emit the full strings. I have since also learned that casting the boolean >>>> to the "text" datatype will also cause Postgres to emit the full string. >>>> >>>> In either case, though, the pain point is that this discrepancy >>>> requires that I examine each table closely for boolean columns, and if >>>> there are any, I then need to enumerate all the columns out explicitly and >>>> use either the CASE statement or the cast method on the boolean columns in >>>> order to get the values Phoenix expects. In my case, I didn't need to >>>> transform any of the data otherwise, so this forced a much more complex >>>> procedure. >>>> >>>> While psql provides many options to change various values -- delimiters >>>> and what not -- there is no option to change the boolean values. Likewise, >>>> as near as I can tell, there are no options to change what Phoenix expects, >>>> either. >>>> >>>> I am wondering: >>>> >>>> - Is there any way from within Phoenix to change the boolean values >>>> that it expects in the CSV? >>>> - Are there any other methods from a Phoenix perspective that you can >>>> think of that would obviate the need for the column-by-column enumeration >>>> procedure I outlined above? >>>> - Since Postgres is a major open source RDBMS which I suspect many >>>> people -- myself included -- will want to export data from into Phoenix, >>>> could there be an enhancement to Phoenix to allow an option to specify a >>>> default boolean value, or perhaps an option to specify the CSV source, from >>>> which the proper value could be defined and utilized automatically? >>>> >>>> Thanks, >>>> - Ken >>>> >>> >>> >
